Monday, November 27, 2023

5 Macro Trends in HPC

High Performance Computing (HPC) is the field of accelerating computing operations and data processing to increase accuracy, reduce time to solution or both. Since the 1960s, HPC has been shaped by pushing computing to explore and conquer new frontiers (It is only apt that the first Exascale system was named Frontier!). In the last decade, however, there have been other factors that have shaped the priorities and direction of the HPC community. Each of these factors influences another. For instance, the convergence of AI/ML with HPC is a driver for an increased adoption of accelerators as it is well known that GPUs are particularly well-suited for solving AI/ML problems as well as for visualization.

Let’s look at 5 macro trends that have had a significant impact.

Accelerators for HPC

Graphics Processing Units (GPUs) were initially built for graphical rendering and 3D video games. When GPUs became programmable, they were adopted to accelerate HPC and AI/ML workloads. Accelerators today extend beyond GPUs to include FPGAs, and other domain specific hardware such as tensor processing units (TPUs).

GPUs are an attractive computing resource from the perspective of achieving higher peak performance and requiring a smaller computational resource footprint (lower cost to result) for the same. An example of an early adopter of GPUs for HPC is in the oil and gas community. Oil and gas companies use seismic imaging methods to detect what is underneath the ground or seabed without having to drill. These are complex algorithms that require the processing, analysis and visualization of large datasets. In the early part of the last decade, there was a conscious effort to take advantage of the GPU architecture to run seismic analysis and interpretation for the parallel computations in the associated models. Today, accelerators are being investigated across multiple HPC spaces including but not limited to atmospheric and climate modeling, genomic sequencing etc. Significant speedups can be achieved by porting portions of these models or entire applications to run on GPUs. There is sufficient literature to indicate that accelerator adoption to run traditional HPC simulations is impacted by the daunting task of porting entire applications to take full advantage of the potential speed up.

AI/ML & HPC

Artificial Intelligence (AI) is the umbrella term used to address the combined areas of machine learning, deep learning and robotics. Over the last decade, AI/ML is increasingly applied to solving science problems. HPC workflows have integrated AI/ML models to enable higher accuracy of modeling and to improve computational efficiency of traditional methods. For instance, inserting reinforcement learning based neural architecture search into the workflow for building models for complex diseases- such as cancer- has been shown to improve training time by 2x or greater and reducing the number of trainable parameters by 10x or greater. This has the potential to significantly accelerate cancer research. Machine learning algorithms are used in other HPC workflows such as material engineering, materials science, molecular dynamics and genomics.

Another use of AI in HPC workflows comes from the concept of creating digital twins. A digital twin is the virtual simulation of a real-world process or place. By creating the digital twin of a product- say a car- the automotive manufacturer can realistically simulate the vehicle design. The virtual replica is a “twin” of the physical product and in the case of automotive manufacturing includes such technologies as sensors (Internet of Things “IoT”), AI/ML and HPC simulation tools. The digital twin enables the simulation and analysis of such elements in the design and manufacturing process as crash simulations, vibration & noise testing, design of safety systems etc. to optimize the product before it is built. Digital twins are also used in robotics development, autonomous driving and to improve workflows and services across multiple industries.

HPDA in HPC

HPDA or High Performance Data Analytics is the convergence of Big Data and HPC. Essentially, HPDA is the use of HPC technologies and systems to analyze very large amounts of data. HPDA has some fundamental architectural considerations that are independent of its integration into an HPC environment- such as data ingestion and data science tools for analysis. As with the other macro trends, HPDA has become increasingly important in HPC as a result of other factors such as the integration of AI/ML into HPC workflows. There is a need to handle very large data sets and run concurrent analytics, AI/ML algorithms, visualization and traditional modeling & simulation algorithms as part of the HPC workflow. HPDA enhances complex scientific and analytical problems such as climate modeling and real-time fraud detection.

HPC in the Cloud and as-a-Service

The last 15 years have been defined by the growth and ubiquitous adoption of the public cloud. The HPC community has been slow to jump on this bandwagon. However, over the last 5 years, there has been a spurt of organizations evaluating the fit of the public cloud and various As-a-service options for their HPC needs. At the same time, traditional users of large scale HPC systems are shaping how the cloud must evolve in order to meet the unique scaling needs of HPC workloads. There are customers using the cloud as an infrastructure source- simply using cloud (or aaS) computing and storage resources as an extension of their on-premises systems. ISVs are also providing options for integration on the cloud platform of choice. One of the big considerations in cloud adoption is whether HPC can scale with a DIY lift-and-shift approach or if creating cloud-native solutions for HPC applications is worthwhile from a cost-benefit standpoint.

Other Factors

Shortage of Skilled Workforce

As far back as almost ten years ago, a global shortage of skilled workers was identified- in the field of engineering in general- and in the specialized area of high-performance computing in particular. This trend has not reversed significantly since then. The US Government has acknowledged with the CHIPS and Science Act of 2022 that there is a shortage of skilled technical workers to meet the rising computing demands and to drive innovation and technology leadership. While this scarcity can be extended to the tech community at large, the macro trend of the adoption of cloud and as-a-service options for HPC has had unintended consequences in the general availability of system administrators, simulation and algorithm specialists.

Sustainability

There are other factors such as sustainability, power and cooling efficiency that are leading considerations when building systems and running HPC applications. With the explosion of generative AI in recent times, the scale and demand on computing resources has also seen an exponential increase.  Data movement is also energy expensive. There is a need for innovation in the architecture of computing and networking technologies, algorithms that run on them and datacenter efficiency.

Latest