Tag

TensorRT-LLM

TensorRT-LLM is NVIDIA’s optimization stack for LLM inference, focused on lower latency, higher throughput, and better GPU utilization. It often shows up alongside MLPerf, Blackwell/GB300, and Dynamo, highlighting how server performance depends on compilation, scheduling, and runtime software as much as hardware.

3 articles

Research/Jun 19

Blackwell wins because agentic AI needs full-stack infrastructure

NVIDIA Blackwell is the right infrastructure bet for agentic AI because it delivers the best measured efficiency at scale.

Research/Apr 3

Nvidia’s MLPerf Gains Show Software Still Matters

Nvidia posted up to 2.77x MLPerf gains on GB300 NVL72, with software tricks like Dynamo and TensorRT-LLM doing heavy lifting.

Industry News/Apr 2

NVIDIA Sets New MLPerf Inference Records

Blackwell Ultra hit new MLPerf Inference v6.0 highs, with GB300 NVL72 gaining 2.7x on DeepSeek-R1 server tests and 1.5x on Llama 3.1 405B.