Tag
TensorRT-LLM
TensorRT-LLM is NVIDIA’s optimization stack for LLM inference, focused on lower latency, higher throughput, and better GPU utilization. It often shows up alongside MLPerf, Blackwell/GB300, and Dynamo, highlighting how server performance depends on compilation, scheduling, and runtime software as much as hardware.
3 articles

Research/Jun 19
Blackwell wins because agentic AI needs full-stack infrastructure
NVIDIA Blackwell is the right infrastructure bet for agentic AI because it delivers the best measured efficiency at scale.

Research/Apr 3
Nvidia’s MLPerf Gains Show Software Still Matters
Nvidia posted up to 2.77x MLPerf gains on GB300 NVL72, with software tricks like Dynamo and TensorRT-LLM doing heavy lifting.

Industry News/Apr 2
NVIDIA Sets New MLPerf Inference Records
Blackwell Ultra hit new MLPerf Inference v6.0 highs, with GB300 NVL72 gaining 2.7x on DeepSeek-R1 server tests and 1.5x on Llama 3.1 405B.