Blackwell’s MLPerf sweep shows why training speeds up
5 Blackwell MLPerf 6.0 results show faster training, bigger scale, and better reliability for frontier AI teams.

Blackwell led MLPerf Training 6.0 with faster training, larger scale, and stronger reliability.
In MLPerf Training 6.0, NVIDIA Blackwell posted the fastest time to train on all seven benchmarks and scaled to 8,192 GPUs.
| Item | Scale | Reported result |
|---|---|---|
| GB300 NVL72 | Rack-scale | Up to 1.6x faster than GB200 NVL72 |
| DeepSeek-V3 671B | 8,192 GPUs | Fastest time to train at the largest scale |
| Llama 3.1 405B on Azure | 8,192 GPUs | Reference quality in 7.07 minutes |
| DeepSeek-V3 671B on CoreWeave | 8,192 GPUs | Reference quality in 2.02 minutes |
| Higgsfield on Nebius | Cloud deployment | 30% shorter training time |
1. Fastest training across all seven benchmarks
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
The headline result is simple: NVIDIA was the only platform submitted across every benchmark in MLPerf Training 6.0, and it delivered the fastest time to train in all seven. That matters because MLPerf is a peer-reviewed benchmark suite, so the results are meant to compare real systems, not marketing claims.

For teams choosing training infrastructure, this is the clearest signal in the batch. It says Blackwell is not tuned for one model family or one lab setup. It is being pushed across dense LLMs, mixture-of-experts workloads, and fine-tuning cases with the same goal: finish training sooner.
- Seven-for-seven fastest time to train
- Submitted on both GB200 NVL72 and GB300 NVL72
- Included new MoE workloads: DeepSeek-V3 671B and GPT-OSS-20B
2. GB300 NVL72’s speed jump over GB200 NVL72
Blackwell Ultra matters because it raises the ceiling inside the same rack-scale design. NVIDIA reported that GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale, driven by higher compute density with NVFP4, more memory, and a higher power ceiling.
That mix is useful when a model is already large enough that small gains in throughput compound into real schedule savings. If you are running long pretraining jobs or repeated fine-tunes, a 1.6x gain can change how many experiments fit into a week.
Key drivers of the GB300 NVL72 gain:
- Higher compute density with NVFP4
- Expanded memory capacity
- Higher power ceiling for sustained performance3. 8,192-GPU scale for MoE and dense models
Scale is the other half of the story. NVIDIA scaled DeepSeek-V3 671B to 8,192 GPUs on GB200 NVL72 systems, which is the largest Blackwell-based submission in MLPerf Training to date. It also submitted Llama 3.1 405B at 5,120 GPUs, showing that the platform is not only about peak single-job speed but also about how far the cluster can stretch.

The networking piece is what makes that scale practical. Within each rack, fifth-generation NVLink Switches connect all 72 GPUs into a shared pool of compute and memory. For distributed clusters, NVIDIA pairs that with Quantum InfiniBand or Spectrum-X Ethernet, depending on the data center design.
- DeepSeek-V3 671B: 8,192 GPUs
- Llama 3.1 405B: 5,120 GPUs
- Rack-scale NVLink Switch fabric across 72 GPUs
4. Partner results that show the platform in production
The most useful part of the blog may be the partner examples, because they show Blackwell outside NVIDIA’s own test cases. Cohere reported 3x faster training on GB200 NVL72 for its North agentic AI platform. Midjourney trained v8 on a Blackwell cluster and is now scaling a large fleet of Blackwell Ultra GPUs on CoreWeave for upcoming image and video models.
There are more signs that the platform is already in production use. Microsoft Azure reached reference quality on Llama 3.1 405B in 7.07 minutes, CoreWeave hit 2.02 minutes on DeepSeek-V3 671B with GB300 NVL72, and Nebius said Higgsfield cut training time by 30% while serving 22 million users and generating over 6 million AI outputs per day.
- Cohere: 3x faster training on GB200 NVL72
- Midjourney: training and scaling on Blackwell Ultra GPUs
- Thinking Machines Lab on Google Cloud: 2x faster training and serving
- Nebius and Higgsfield: 30% shorter training time
5. Reliability features for long training runs
Performance only matters if a job survives long enough to finish. NVIDIA frames Blackwell’s reliability story around fewer interruptions and faster recovery. Before a GPU reaches a data center, it goes through 30+ manufacturing test stages. In operation, the Reliability, Availability and Serviceability Engine watches nearly the entire chip, while self-healing logic can route around faults without stopping the workload.
At the cluster level, Spectrum-X Ethernet can reroute around failed links in milliseconds. If a fault does interrupt a job, NVIDIA Resiliency Extension, or NVRx, helps resume from a recent checkpoint instead of restarting from zero. That is especially relevant for runs that span weeks or months across hundreds of thousands of GPUs.
Reliability stack:
- 30+ manufacturing test stages
- RAS Engine monitoring
- Self-healing fault routing
- Spectrum-X link rerouting
- NVRx checkpoint recoveryHow to decide
If you want the fastest benchmark story, look at the seven-for-seven MLPerf sweep and the GB300 NVL72 result. If your priority is cluster size, the 8,192-GPU DeepSeek-V3 671B run is the clearest proof point. If you care about real-world adoption, the partner wins from Cohere, Midjourney, Azure, CoreWeave, and Nebius are the strongest signals.
For most AI teams, the practical takeaway is that Blackwell is being positioned as a full training platform, not just a fast GPU. It combines speed, scale, and recovery features in a way that fits frontier model work, where every lost hour and every failed run has a cost.
// Related Articles
- [IND]
Baya and Openchip are betting the future of AI silicon on data moveme…
- [IND]
Citigroup Sees Tokenized Assets Hitting $8.2T
- [IND]
RWA tokenization turns assets into on-chain rails
- [IND]
AI companies should stop pretending midterm spending is neutral
- [IND]
This AI market map list is a better signal than most AI newsletters
- [IND]
Worldcoin’s rally is a credibility test, not a breakout to chase