NVIDIA's AI Platform Sets New Records in MLPerf Benchmarks

NVIDIA's AI platform has achieved remarkable results in the latest MLPerf industry benchmarks, raising the bar for AI training and high-performance computing. One standout achievement is in generative AI, where NVIDIA Eos, an AI supercomputer powered by 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking, completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes. This represents a nearly 3x improvement from the previous record set by NVIDIA less than six months ago.

The benchmark utilized a portion of the full GPT-3 dataset behind the popular ChatGPT service. Extrapolating from these results, Eos could now train the entire dataset in just eight days, which is 73x faster than a previous state-of-the-art system using 512 A100 GPUs. This significant reduction in training time leads to cost savings, energy efficiency, and faster time-to-market. It also enables the widespread adoption of large language models (LLMs) by businesses, facilitated by tools like NVIDIA NeMo, a framework for customizing LLMs.

In another generative AI test, 1,024 NVIDIA Hopper architecture GPUs completed a training benchmark based on the Stable Diffusion text-to-image model in just 2.5 minutes, setting a high standard for this new workload. By incorporating these two tests, MLPerf solidifies its position as the industry standard for measuring AI performance, recognizing the transformative potential of generative AI.

System Scaling Soars

The impressive results were made possible by utilizing the highest number of accelerators ever applied to an MLPerf benchmark. The 10,752 H100 GPUs surpassed the scaling achieved in June, when NVIDIA used 3,584 Hopper GPUs.

The 3x increase in GPU numbers resulted in a 2.8x improvement in performance, with a remarkable 93% efficiency rate, thanks in part to software optimizations.

Efficient scaling is crucial in generative AI due to the exponential growth of LLMs each year. NVIDIA's ability to meet this unprecedented challenge demonstrates its commitment to supporting even the largest data centers.

This achievement is the result of a comprehensive platform of innovations in accelerators, systems, and software, which were utilized by both Eos and Microsoft Azure in the latest round of testing.

Eos and Azure both employed 10,752 H100 GPUs in separate submissions and achieved nearly identical performance, demonstrating the efficiency of NVIDIA AI in both data center and public-cloud deployments.

NVIDIA relies on Eos for various critical tasks, including advancing initiatives like NVIDIA DLSS, an AI-powered software for cutting-edge computer graphics, and NVIDIA Research projects like ChipNeMo, which develops generative AI tools for designing next-generation GPUs.

Advances Across Workloads

In addition to making strides in generative AI, NVIDIA set several new records in other areas during this round of testing.

For example, H100 GPUs were 1.6x faster than the previous-round training recommender models widely used to assist users in finding information online. Performance also improved by 1.8x on RetinaNet, a computer vision model.

These advancements were achieved through a combination of software improvements and scaled-up hardware.

NVIDIA was the only company to run all MLPerf tests once again. H100 GPUs demonstrated the fastest performance and the greatest scaling in each of the nine benchmarks.

These speedups translate to faster time-to-market, reduced costs, and energy savings for users training massive LLMs or customizing them with frameworks like NeMo to meet their specific business needs.

Eleven system makers utilized the NVIDIA AI platform in their submissions for this round, including ASUS, Dell Technologies, Fujitsu, GIGABYTE, Lenovo, QCT, and Supermicro.

NVIDIA partners participate in MLPerf because they recognize its value as a tool for evaluating AI platforms and vendors.

HPC Benchmarks Expand

In the MLPerf HPC benchmark, which focuses on AI-assisted simulations on supercomputers, H100 GPUs delivered up to twice the performance of NVIDIA A100 Tensor Core GPUs in the previous HPC round. These results represent up to 16x gains since the first MLPerf HPC round in 2019.

The benchmark included a new test that trained OpenFold, a model that predicts the 3D structure of a protein from its sequence of amino acids. OpenFold can perform vital work for healthcare in minutes, work that used to take researchers weeks or months.

Understanding a protein's structure is crucial for quickly identifying effective drugs, as most drugs act on proteins, which play a significant role in biological processes.

In the MLPerf HPC test, H100 GPUs trained OpenFold in just 7.5 minutes. This test is representative of the entire AlphaFold training process, which previously took 11 days using 128 accelerators.

A version of the OpenFold model and the software used by NVIDIA to train it will soon be available in NVIDIA BioNeMo, a generative AI platform for drug discovery.

Several partners, including Dell Technologies and supercomputing centers at Clemson University, the Texas Advanced Computing Center, and Lawrence Berkeley National Laboratory with assistance from Hewlett Packard Enterprise (HPE), made submissions on the NVIDIA AI platform for this round.

Benchmarks With Broad Backing

Since its establishment in May 2018, the MLPerf benchmarks have received widespread support from both industry and academia. Organizations such as Amazon, Arm, Baidu, Google, Harvard, HPE, Intel, Lenovo, Meta, Microsoft, NVIDIA, Stanford University, and the University of Toronto endorse these benchmarks.

MLPerf tests are transparent and objective, allowing users to rely on the results when making informed purchasing decisions.

All the software used by NVIDIA is available from the MLPerf repository, ensuring that all developers can achieve the same world-class results. These software optimizations are continuously integrated into containers available on NGC, NVIDIA's software hub for GPU applications.