[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-nvidia-ai-models-playbook-en":3,"article-related-nvidia-ai-models-playbook-en":30,"series-tools-bece181a-96c8-494b-ac0b-fb254413e051":84},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"bece181a-96c8-494b-ac0b-fb254413e051","nvidia-ai-models-playbook-en","NVIDIA AI Models turn model hunting into a playbook","\u003Cp data-speakable=\"summary\">\u003Ca href=\"\u002Ftag\u002Fnvidia\">NVIDIA\u003C\u002Fa>’s AI Models page turns model selection into a deployment playbook.\u003C\u002Fp>\u003Cp>I've been using model directories like this for a while now, and they usually annoy me in the same way: they look helpful until I'm actually trying to ship something. Then I'm bouncing between a dozen tabs, half of them marketing, half of them docs, and none of them telling me the one thing I need first: what should I run, where should I run it, and what do I do when it’s too slow or too expensive?\u003C\u002Fp>\u003Cp>NVIDIA’s \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models\">AI Models\u003C\u002Fa> page is better than most, but I still had to read it like a developer, not a brochure. The page is really a routing table. It points you from model families to deployment paths: \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models#deepseek\">DeepSeek\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models#gemma\">Gemma\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models#gpt-oss\">gpt-oss\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models#kimi\">Kimi\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models#llama\">Llama\u003C\u002Fa>, and the rest. Once I stopped treating it like a catalog and started treating it like a decision tree, the whole thing made more sense.\u003C\u002Fp>\u003Cp>That shift matters because the page is not saying, “Here are some models, good luck.” It’s saying, “Pick a model family, then pick your path: prototype with NIM, optimize with TensorRT-LLM, customize with NeMo, or run locally with Ollama, \u003Ca href=\"\u002Ftag\u002Fvllm\">vLLM\u003C\u002Fa>, Hugging Face, or llama.cpp.” That’s the useful part. The rest is just branding noise.\u003C\u002Fp>\u003Cp>In this breakdown, I’m going to strip the page down into the actual workflow I’d use on a real project, plus a copy-ready template at the end you can reuse when you need to choose a model without turning your week into a \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> festival.\u003C\u002Fp>\u003Ch2>Stop reading it like a catalog\u003C\u002Fh2>\u003Cblockquote>\"Explore and deploy top AI models built by the community, accelerated by NVIDIA’s AI inference platform, and run on NVIDIA-accelerated infrastructure.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: NVIDIA wants this page to be the front door for model choice, but the real value is in how it routes you into deployment. The page is built around model families, then immediately pushes you toward the tools that make those models usable on NVIDIA hardware.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780771718084-8xiy.png\" alt=\"NVIDIA AI Models turn model hunting into a playbook\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>I ran into this when I was trying to decide whether a model was worth the trouble on a local GPU box versus a data center setup. The mistake I kept making was comparing \u003Ca href=\"\u002Fnews\u002Fwhy-gemini-drops-matter-more-than-model-names-en\">model names\u003C\u002Fa> instead of comparing operational paths. That’s backwards. A model that looks great on paper can be a pain if the only viable path is a stack you don’t want to maintain.\u003C\u002Fp>\u003Cp>On this page, each family comes with a pattern: explore samples, integrate with a runtime, optimize \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa>, then get a production-ready version. That’s the real structure. It’s less “browse models” and more “pick your route.”\u003C\u002Fp>\u003Cp>How to apply it:\u003C\u002Fp>\u003Cul>\u003Cli>Start with deployment constraints first: edge, workstation, single GPU, or cluster.\u003C\u002Fli>\u003Cli>Then check which runtime path the page suggests: \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fnim\">NVIDIA NIM\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\">vLLM\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Follama.com\u002F\">Ollama\u003C\u002Fa>, or \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F\">Hugging Face\u003C\u002Fa>.\u003C\u002Fli>\u003Cli>Only after that should you compare model size, architecture, and benchmark claims.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The page keeps repeating this pattern because it’s trying to reduce the usual friction: model discovery, integration, optimization, deployment. That’s the actual workflow. Everything else is a subheading.\u003C\u002Fp>\u003Ch2>DeepSeek is the page’s performance-first example\u003C\u002Fh2>\u003Cblockquote>\"DeepSeek is a family of open-source models that features several powerful models using a mixture-of-experts (MoE) architecture and provides advanced reasoning capabilities.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: DeepSeek is the page’s example of a model family where architecture matters as much as capability. MoE changes the performance conversation because you’re not just asking “is it smart?” You’re asking “can I run this efficiently enough to matter?”\u003C\u002Fp>\u003Cp>The page leans hard into optimization here. It points to \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa> for data center deployments, \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fnim\">NIM\u003C\u002Fa> for quick trials and production-ready packaging, and \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FNeMo\">NeMo\u003C\u002Fa> for customization. That trio tells me NVIDIA expects you to move from experiment to production without rewriting your stack three times.\u003C\u002Fp>\u003Cp>I’ve seen teams get stuck on model quality debates when the real blocker was throughput. A model can be brilliant and still be the wrong choice if it forces you into a cost profile your app can’t absorb. That’s why the page keeps surfacing performance notes, like the DeepSeek-R1 8K\u002F1K result showing a 15x performance benefit and revenue opportunity on Blackwell GB200 NVL72 over Hopper H200. I’m not using that as a universal promise; I’m using it as a signal that NVIDIA wants you to think in hardware terms, not just model terms.\u003C\u002Fp>\u003Cp>How to apply it:\u003C\u002Fp>\u003Cul>\u003Cli>If you’re evaluating a reasoning-heavy app, test DeepSeek first against your latency and token budget.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa> when you care about squeezing inference performance out of NVIDIA GPUs.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fnim\u002F\">NIM docs\u003C\u002Fa> when you want a packaged deployment path instead of building every layer yourself.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fnemo-framework\u002F\">NeMo docs\u003C\u002Fa> when your real problem is adapting the model to your data.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The practical takeaway is simple: DeepSeek isn’t just a model family on this page, it’s a template for how NVIDIA wants you to think about open models on its hardware. Pick the model, then pick the acceleration path.\u003C\u002Fp>\u003Ch2>Gemma is the “works everywhere” story, if you read it correctly\u003C\u002Fh2>\u003Cblockquote>\"Gemma is Google DeepMind’s family of lightweight, open models.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: Gemma is the page’s answer when you need smaller models that still fit into a serious deployment story. The page calls out support across data center GPUs, Windows RTX, and Jetson devices. That’s not fluff. That’s the clue that Gemma is meant to travel.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780771739631-iroa.png\" alt=\"NVIDIA AI Models turn model hunting into a playbook\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>I like this section because it’s the least dramatic and most useful. Not every project needs a giant reasoning monster. Sometimes you need something you can run on a workstation, test quickly, and then move into a product without rebuilding the whole pipeline. Gemma fits that kind of work better than the “look how huge this model is” crowd.\u003C\u002Fp>\u003Cp>The page also notes that Gemma 3n is natively multilingual and multimodal for text, image, video, and audio. That matters because it changes the kind of app you can build without stitching together separate systems for every modality. NVIDIA then routes you to NIM for production-grade support, NeMo for customization, TensorRT-LLM for optimization, and Ollama for fast local experimentation.\u003C\u002Fp>\u003Cp>How to apply it:\u003C\u002Fp>\u003Cul>\u003Cli>Choose Gemma when your main constraint is portability across devices.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Follama.com\u002F\">Ollama\u003C\u002Fa> for a quick local test loop.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa> if you need to push throughput on NVIDIA GPUs.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F\">Hugging Face\u003C\u002Fa> if you want to fine-tune or adapt a smaller checkpoint with normal tooling.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The page also points to sample applications and Jetson demos, which is NVIDIA’s way of saying: don’t overthink the first prototype. Get it running on the target class of device and see where the pain actually is.\u003C\u002Fp>\u003Ch2>gpt-oss is NVIDIA’s proof that open-weight models need a runtime plan\u003C\u002Fh2>\u003Cblockquote>\"NVIDIA has optimized both new open-weight models for 10x inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: NVIDIA is not presenting gpt-oss as just another model family. It’s presenting it as a hardware-plus-runtime story. The model matters, but the runtime and kernel work matter just as much. If you ignore that, you miss the point of the page.\u003C\u002Fp>\u003Cp>I’m always suspicious when a page starts quoting throughput numbers without enough context, but I don’t need to treat this as a benchmark contest to see the pattern. The page is telling you that the same model can look very different depending on whether you run it through TensorRT-LLM, vLLM, SGLang, Ollama, or other supported paths. That’s the whole game.\u003C\u002Fp>\u003Cp>This is also where NVIDIA’s ecosystem strategy becomes obvious. The page references \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-oss\">OpenAI’s gpt-oss\u003C\u002Fa> models, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\">vLLM\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp\">llama.cpp\u003C\u002Fa>, and \u003Ca href=\"https:\u002F\u002Follama.com\u002F\">Ollama\u003C\u002Fa>. That’s not random. It’s showing you the same model family across multiple developer entry points.\u003C\u002Fp>\u003Cp>How to apply it:\u003C\u002Fp>\u003Cul>\u003Cli>Use gpt-oss when you want open-weight flexibility and care about deployment speed.\u003C\u002Fli>\u003Cli>Use TensorRT-LLM if your bottleneck is inference performance on Blackwell or Hopper.\u003C\u002Fli>\u003Cli>Use vLLM or SGLang if your team already lives in those serving stacks.\u003C\u002Fli>\u003Cli>Use Ollama or llama.cpp if you want a local-first developer loop.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>My take: this section is the clearest sign that “model choice” is now inseparable from “serving choice.” If you’re not thinking about both, you’re not really choosing a model. You’re just collecting names.\u003C\u002Fp>\u003Ch2>Kimi shows what happens when scale gets weird\u003C\u002Fh2>\u003Cblockquote>\"Kimi K2 is a state-of-the-art MoE language model with 32 billion activated parameters and 1 trillion total parameters.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: Kimi is the page’s example of a model family where the headline number is only half the story. Activated parameters and total parameters are not the same thing, and NVIDIA is clearly expecting you to understand that the serving path matters because the model is huge in a very particular way.\u003C\u002Fp>\u003Cp>The page says Kimi K2 Thinking MoE saw a 10x performance leap on NVIDIA GB200 NVL72 compared with NVIDIA HGX H200, and it calls out Fireworks AI deploying Kimi K2 on NVIDIA B200 to hit top leaderboard performance. Again, I’m not treating that as a universal truth for every setup. I’m reading it as a signal that the page wants you to think about scale, routing, and infrastructure together.\u003C\u002Fp>\u003Cp>This is where teams often get sloppy. They hear “open model” and assume the operational burden is low. It isn’t. Large MoE models can be very efficient in the right setup, but they can also become a mess if you don’t plan for routing, memory, and serving topology. The page keeps pointing back to optimized deployment paths because that’s where the real work is.\u003C\u002Fp>\u003Cp>How to apply it:\u003C\u002Fp>\u003Cul>\u003Cli>Use Kimi when you need a large open model and your infrastructure can actually support it.\u003C\u002Fli>\u003Cli>Check the NVIDIA NIM path if you want a packaged deployment option.\u003C\u002Fli>\u003Cli>Use TensorRT-LLM when you need to squeeze the most from the hardware you already own.\u003C\u002Fli>\u003Cli>Use the page’s sample links to validate whether your workload is reasoning-heavy, chat-heavy, or agent-heavy before you commit.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>I’d treat Kimi as the “read the fine print” family. If DeepSeek is the performance-first example and Gemma is the portable one, Kimi is the reminder that scale changes the deployment conversation in ways that marketing copy never explains well.\u003C\u002Fp>\u003Ch2>Llama is the familiar default, but NVIDIA still wants to tune it\u003C\u002Fh2>\u003Cblockquote>\"Llama is Meta’s collection of open foundation models, most recently made multimodal with the 2025 release of Llama 4.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: Llama is the family most developers already know, so NVIDIA is using it as the easiest on-ramp to the rest of the page. The page doesn’t just say “here’s Llama.” It says NVIDIA worked with \u003Ca href=\"\u002Ftag\u002Fmeta\">Meta\u003C\u002Fa> to advance inference using TensorRT-LLM, offers optimized versions as NIM microservices, and supports customization through NeMo.\u003C\u002Fp>\u003Cp>This is the section I’d expect most teams to start with, because Llama is the least surprising name on the page. That’s fine. Familiarity is useful. But the page is still making the same point: don’t stop at the model name. Decide whether you want local experimentation, optimized serving, or customization with your own data.\u003C\u002Fp>\u003Cp>I’ve lost more time than I care to admit by assuming the default model would also be the default operational path. Usually it isn’t. The page’s Llama section is basically NVIDIA saying, “Yes, use the thing you already know, but use it through our optimized stack if you care about performance.” Fair enough.\u003C\u002Fp>\u003Cp>How to apply it:\u003C\u002Fp>\u003Cul>\u003Cli>Use Llama when your team already has familiarity and you want the shortest path to a working prototype.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fnim\">NIM\u003C\u002Fa> if you want a production-ready microservice instead of wiring everything by hand.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa> if you need better throughput on NVIDIA GPUs.\u003C\u002Fli>\u003Cli>Use \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FNeMo\">NeMo\u003C\u002Fa> when your business logic depends on your own data.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>There’s a reason Llama gets a long section here. It’s the bridge between “I know what this model is” and “I now need to run it like an adult.”\u003C\u002Fp>\u003Ch2>The real pattern is model, runtime, optimize, ship\u003C\u002Fh2>\u003Cblockquote>\"Get started with the right tools and frameworks for your development environment.\"\u003C\u002Fblockquote>\u003Cp>What this actually means is: NVIDIA wants the page to act like a workflow checklist. Every family follows the same arc. Explore the model. Integrate with a runtime. Optimize inference. Deploy a production-ready microservice. That’s the pattern I’d actually copy.\u003C\u002Fp>\u003Cp>This is the part most model pages get wrong. They either dump a list of checkpoints on you or they bury the deployment path under too much platform language. NVIDIA at least gives you the sequence, even if it’s wrapped in a lot of product names. Once you see the sequence, the page becomes useful instead of noisy.\u003C\u002Fp>\u003Cp>Here’s the practical version I’d use on a real project:\u003C\u002Fp>\u003Cul>\u003Cli>Pick one model family based on your use case, not hype.\u003C\u002Fli>\u003Cli>Prototype with the fastest path available, usually NIM or Ollama.\u003C\u002Fli>\u003Cli>Measure latency, memory use, and token throughput on your actual hardware.\u003C\u002Fli>\u003Cli>Move to TensorRT-LLM if optimization matters.\u003C\u002Fli>\u003Cli>Use NeMo only if you need customization or adaptation.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>That sequence saves me from the usual trap of over-investing in the wrong layer. A lot of teams start by fine-tuning when they should be benchmarking. Or they start by optimizing when they haven’t even proven the use case. This page is useful because it nudges you toward the right order.\u003C\u002Fp>\u003Cp>And yes, the page has a lot of NVIDIA-specific infrastructure around it: Blackwell, Hopper, Jetson, RTX, DGX, NIM, NeMo, TensorRT-LLM. I don’t think you need to memorize the whole stack. I think you need to know which layer solves which problem. That’s enough.\u003C\u002Fp>\u003Ch2>The template you can copy\u003C\u002Fh2>\u003Cpre>\u003Ccode># AI model selection template inspired by NVIDIA’s AI Models page\n\n## 1) What am I building?\n- Use case:\n- Primary input type: text \u002F image \u002F audio \u002F video \u002F multimodal\n- Primary constraint: latency \u002F cost \u002F portability \u002F customization \u002F throughput\n- Target deployment: local \u002F edge \u002F workstation \u002F data center \u002F cloud\n\n## 2) Which model family fits first?\n- DeepSeek: reasoning-heavy, performance-sensitive workloads\n- Gemma: lightweight, portable, multi-device workflows\n- gpt-oss: open-weight models with a strong serving\u002Fruntime focus\n- Kimi: large MoE workloads where scale and routing matter\n- Llama: familiar general-purpose foundation model path\n- Other: \n\n## 3) What is my first run path?\n- Fast prototype: NIM \u002F Ollama \u002F Hugging Face \u002F llama.cpp\n- Serving stack: TensorRT-LLM \u002F vLLM \u002F SGLang\n- Customization: NeMo \u002F Transformers \u002F PyTorch\n- Hardware target: Blackwell \u002F Hopper \u002F RTX \u002F Jetson\n\n## 4) What do I measure before I commit?\n- Tokens per second:\n- Time to first token:\n- Memory footprint:\n- Cost per request:\n- Quality on my own prompts:\n\n## 5) What is the next move if it works?\n- Keep the same model and optimize serving\n- Quantize the model\n- Move to NIM for packaging\n- Fine-tune or adapt with NeMo\n- Swap to a smaller or faster family\n\n## 6) Decision rule\nIf the model is good enough but too slow, optimize the runtime first.\nIf the model is too expensive, test a smaller family before fine-tuning.\nIf the model needs my data, customize after the benchmark, not before.\nIf the deployment target changes, re-evaluate the family instead of forcing the old choice.\n\n## 7) Copy-paste prompt for internal evaluation\nI need to choose an AI model for:\n[describe app]\n\nConstraints:\n- Deployment target: [local\u002Fedge\u002Fcloud\u002Fdata center]\n- Latency budget: [number]\n- Cost budget: [number]\n- Input types: [text\u002Fimage\u002Faudio\u002Fvideo]\n- Need for customization: [low\u002Fmedium\u002Fhigh]\n\nRecommend one model family from:\n- DeepSeek\n- Gemma\n- gpt-oss\n- Kimi\n- Llama\n\nThen recommend the first runtime path:\n- NIM\n- TensorRT-LLM\n- Ollama\n- vLLM\n- llama.cpp\n- NeMo\n\nExplain the choice in one paragraph and include the first benchmark I should run.\n\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>The nice thing about this template is that it forces the conversation away from model fandom and back toward shipping. That’s the whole point. If you can’t explain the deployment path, you don’t really have a model choice yet.\u003C\u002Fp>\u003Cp>Source-wise, this breakdown is based on NVIDIA’s AI Models page at \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models\">https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models\u003C\u002Fa>. My structure, framing, and template are original, but the model family summaries and deployment paths come from NVIDIA’s published page and linked docs.\u003C\u002Fp>","I break down NVIDIA’s AI Models page into a practical workflow for picking, optimizing, and shipping open models.","developer.nvidia.com","https:\u002F\u002Fdeveloper.nvidia.com\u002Fai-models",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780771718084-8xiy.png","tools","en","fd6e6e2e-4036-4fc0-8521-0d8237178f85",[17,18,19,20,21],"NVIDIA","AI models","TensorRT-LLM","NIM","open models",[23,24,25],"Treat the page as a deployment workflow, not a catalog.","Pick runtime paths before comparing model names.","Use the template to turn model choice into a repeatable decision.",0,"2026-06-06T18:48:07.10885+00:00","2026-06-06T18:48:07.099+00:00","a7343b93-37cc-4634-a2bc-707f6275bdb6",{"tags":31,"relatedLang":43,"relatedPosts":47},[32,34,37,39,41],{"name":20,"slug":33},"nim",{"name":35,"slug":36},"Nvidia","nvidia",{"name":21,"slug":38},"open-models",{"name":19,"slug":40},"tensorrt-llm",{"name":18,"slug":42},"ai-models",{"id":15,"slug":44,"title":45,"language":46},"nvidia-ai-models-playbook-zh","NVIDIA AI Models 把選模變成流程","zh",[48,54,60,66,72,78],{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"4065ada8-125b-4286-85c5-85cfe7d6369a","llm-leaderboard-2026-300-models-ranked-en","LLM Leaderboard 2026: 300+ Models Ranked","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780776189065-qk79.png","2026-06-06T20:02:37.334702+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"92a22a3d-6d0c-4884-9865-c1fe0f2e5e78","llama-benchy-llama-bench-style-api-benchmarks-en","llama-benchy brings llama-bench tests to APIs","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780775297695-nchl.png","2026-06-06T19:47:54.675055+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"df69beef-d6a6-40d1-9284-474eebad74b7","how-to-start-vibe-coding-with-ai-en","How to Start Vibe Coding with AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780773471455-gav1.png","2026-06-06T19:17:22.823911+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"40bf1841-77d9-4bf6-9764-3e956510d41a","kimi-k25-claude-code-cline-roocode-setup-en","Kimi K2.5 works in Claude Code and Cline","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780769031200-z2kv.png","2026-06-06T18:03:19.685945+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"258a698f-2ab5-47bf-9b3b-ec8a8e14b8be","why-small-businesses-should-use-ai-for-admin-en","Why small businesses should use AI for admin, not everything","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780758184970-z888.png","2026-06-06T15:02:18.347592+00:00",{"id":79,"slug":80,"title":81,"cover_image":82,"image_url":82,"created_at":83,"category":13},"7da5424f-1ff8-483a-80ed-7091c5b0454b","crun-ai-gemini-omni-chat-video-editing-en","Crun AI turns Gemini Omni into chat video editing","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780733910991-ji5m.png","2026-06-06T08:18:00.680201+00:00",[85,90,95,100,105,110,115,120,125,130],{"id":86,"slug":87,"title":88,"created_at":89},"8008f1a9-7a00-4bad-88c9-3eedc9c6b4b1","surepath-ai-mcp-policy-controls-en","SurePath AI's New MCP Policy Controls Enhance AI Security","2026-03-26T01:26:52.222015+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"27e39a8f-b65d-4f7b-a875-859e2b210156","mcp-standard-ai-tools-2026-en","MCP Standard in 2026: Integrating AI Tools","2026-03-26T01:27:43.127519+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"165f9a19-c92d-46ba-b3f0-7125f662921d","rag-2026-transforming-enterprise-ai-en","How RAG in 2026 is Transforming Enterprise AI","2026-03-26T01:28:11.485236+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"6a2a8e6e-b956-49d8-be12-cc47bdc132b2","mastering-ai-prompts-2026-guide-en","Mastering AI Prompts: A 2026 Guide for Developers","2026-03-26T01:29:07.835148+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"3ab2c67e-4664-4c67-a013-687a2f605814","garry-tan-open-sources-claude-code-toolkit-en","Garry Tan Open-Sources a Claude Code Toolkit","2026-03-26T08:26:20.245934+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"66a7cbf8-7e76-41d4-9bbf-eaca9761bf69","github-ai-projects-to-watch-in-2026-en","20 GitHub AI Projects to Watch in 2026","2026-03-26T08:28:09.752027+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"9f332fda-eace-448a-a292-2283951eee71","practical-github-guide-learning-ml-2026-en","A Practical GitHub Guide to Learning ML in 2026","2026-03-27T01:16:50.125678+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"1b1f637d-0f4d-42bd-974b-07b53829144d","aiml-2026-student-ai-ml-lab-repo-review-en","AIML-2026 Is a Bare-Bones Student Lab Repo","2026-03-27T01:21:51.661231+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"6d1bf3f6-e191-4d30-b55b-8a0722fa6afe","ai-trending-github-repos-and-research-feeds-en","AI Trending Tracks Repos and Research Feeds","2026-03-27T01:31:35.709532+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"010539a1-4c3a-4bd3-937a-26616422ee0d","awesome-ai-for-science-research-tools-map-en","Awesome AI for Science Is Becoming a Real Research Map","2026-03-27T01:46:50.89513+00:00"]