[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-cuda-cores-memory-tensor-cores-win-en":3,"article-related-cuda-cores-memory-tensor-cores-win-en":31,"series-industry-5417136f-52b3-4b04-9c9b-5cbb4df36584":85},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"5417136f-52b3-4b04-9c9b-5cbb4df36584","cuda-cores-memory-tensor-cores-win-en","CUDA cores matter, but memory and Tensor Cores win","\u003Cp data-speakable=\"summary\">\u003Ca href=\"\u002Ftag\u002Fcuda\">CUDA\u003C\u002Fa> cores help speed AI training, but memory, architecture, and Tensor Cores often matter more.\u003C\u002Fp>\n\u003Cp>If you are choosing a \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> for AI work, this guide shows what CUDA cores do, how they differ from Tensor Cores, and why raw core count is not the whole story. One useful \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa>: an RTX 4090 has 16,384 CUDA cores and can reach about 70 trillion FP32 operations per second.\u003C\u002Fp>\n\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Item\u003C\u002Fth>\u003Cth>CUDA cores\u003C\u002Fth>\u003Cth>Memory\u003C\u002Fth>\u003Cth>Cloud price\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>RTX A6000\u003C\u002Ftd>\u003Ctd>10,752\u003C\u002Ftd>\u003Ctd>48 GB GDDR6\u003C\u002Ftd>\u003Ctd>$0.35\u002Fhr\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>A100 80GB\u003C\u002Ftd>\u003Ctd>6,912\u003C\u002Ftd>\u003Ctd>80 GB HBM2e\u003C\u002Ftd>\u003Ctd>$0.78\u002Fhr\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>L40\u003C\u002Ftd>\u003Ctd>n\u002Fa\u003C\u002Ftd>\u003Ctd>48 GB GDDR6\u003C\u002Ftd>\u003Ctd>$0.89\u002Fhr\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>L40S\u003C\u002Ftd>\u003Ctd>n\u002Fa\u003C\u002Ftd>\u003Ctd>48 GB GDDR6\u003C\u002Ftd>\u003Ctd>$0.99\u002Fhr\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>H100 80GB\u003C\u002Ftd>\u003Ctd>14,592\u003C\u002Ftd>\u003Ctd>80 GB HBM3\u003C\u002Ftd>\u003Ctd>$1.38\u002Fhr\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch2>1. CUDA cores are the GPU’s general-purpose workers\u003C\u002Fh2>\n\u003Cp>CUDA stands for Compute Unified Device Architecture, \u003Ca href=\"\u002Ftag\u002Fnvidia\">NVIDIA\u003C\u002Fa>’s platform for programming its GPUs. CUDA cores are the physical processing units inside those GPUs, and they handle parallel arithmetic such as addition, multiplication, and floating-point math.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781110970740-rly0.png\" alt=\"CUDA cores matter, but memory and Tensor Cores win\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\n\u003Cp>That design is why GPUs can do many small jobs at once while CPUs focus on fewer, more complex tasks. A GPU with thousands of CUDA cores can push through repetitive calculations far faster than a CPU when the workload splits cleanly into parallel pieces.\u003C\u002Fp>\n\u003Cul>\n  \u003Cli>Best fit: floating-point math, integer math, parallel compute\u003C\u002Fli>\n  \u003Cli>Common use cases: graphics, scientific computing, mining, AI preprocessing\u003C\u002Fli>\n  \u003Cli>Example: an RTX 4090 has 16,384 CUDA cores\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>2. Tensor Cores do the heavy AI matrix work\u003C\u002Fh2>\n\u003Cp>CUDA cores are generalists, while Tensor Cores are specialists built for deep learning. Introduced with Volta in 2017, Tensor Cores accelerate matrix operations used in training and \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa>, especially with FP16, BF16, INT8, and TF32 formats.\u003C\u002Fp>\n\u003Cp>In practice, that means Tensor Cores often drive the biggest gains in modern AI training. Thunder Compute notes that Tensor Cores can make neural-network training up to 20 times faster than CUDA cores alone, because they process matrix blocks in a single clock cycle.\u003C\u002Fp>\n\u003Ccode>CUDA cores: preprocessing, activations, non-matrix math\nTensor Cores: matrix multiplies in attention and convolutions\u003C\u002Fcode>\n\u003Ch2>3. More CUDA cores do not always mean faster training\u003C\u002Fh2>\n\u003Cp>A higher core count can help, but it is not a reliable shortcut to better performance. Memory bandwidth, cache behavior, clock speed, architecture, and memory capacity can outweigh raw CUDA core totals in real workloads.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781110970218-nks0.png\" alt=\"CUDA cores matter, but memory and Tensor Cores win\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\n\u003Cp>The RTX 4080 is a good example: it has 9,728 CUDA cores, fewer than the RTX 3090’s 10,496, yet it often performs better because of newer architecture and a stronger memory subsystem. For AI specifically, Tensor Core count and available VRAM often matter more than the CUDA core number on the box.\u003C\u002Fp>\n\u003Cul>\n  \u003Cli>Check memory bandwidth before comparing core counts\u003C\u002Fli>\n  \u003Cli>Check VRAM size if your dataset or model is large\u003C\u002Fli>\n  \u003Cli>Check architecture generation, not just spec-sheet totals\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>4. CUDA performance depends on how the chip moves data\u003C\u002Fh2>\n\u003Cp>CUDA cores live inside Streaming Multiprocessors, or SMs, and the GPU scheduler keeps threads moving in warps. That setup only works well when data reaches the cores efficiently through registers, shared memory, and global memory.\u003C\u002Fp>\n\u003Cp>This is why a GPU can look strong on paper and still underperform in practice. If memory access is slow or the workload is poorly organized, the cores sit idle. For AI training, the fastest card is often the one that keeps compute and memory in balance.\u003C\u002Fp>\n\u003Cul>\n  \u003Cli>SMs group CUDA cores into execution blocks\u003C\u002Fli>\n  \u003Cli>Warps run threads in lockstep\u003C\u002Fli>\n  \u003Cli>Memory hierarchy affects real throughput as much as core count\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>5. CUDA matters most when you pick the right GPU tier\u003C\u002Fh2>\n\u003Cp>CUDA runs only on NVIDIA GPUs, so your choice usually comes down to data center, workstation, or consumer hardware. A100 and H100 cards are built for large-scale training, while RTX-class cards are often better for prototyping, fine-tuning, and inference.\u003C\u002Fp>\n\u003Cp>Cloud access makes that choice easier because you can test different configurations without buying hardware. Thunder Compute offers CUDA-powered instances starting at $0.35\u002Fhr, with A100 80GB at $0.78\u002Fhr and H100 at $1.38\u002Fhr, plus CUDA preinstalled for PyTorch, TensorFlow, and custom kernels.\u003C\u002Fp>\n\u003Cul>\n  \u003Cli>RTX A6000: good starting point for prototyping\u003C\u002Fli>\n  \u003Cli>A100 80GB: strong for larger models and memory-heavy runs\u003C\u002Fli>\n  \u003Cli>H100 80GB: best for serious training when budget allows\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>How to decide\u003C\u002Fh2>\n\u003Cp>If you care about general CUDA development, look for a balanced GPU with enough cores, enough VRAM, and decent memory bandwidth. If you care about AI training, prioritize Tensor Cores and memory capacity first, then compare CUDA core counts as a secondary detail.\u003C\u002Fp>\n\u003Cp>For small teams and individual builders, cloud GPUs can be the simplest path. Start with a cheaper RTX-class instance, then move to A100 or H100 only when your model size, batch size, or training time justifies the jump.\u003C\u002Fp>","5 CUDA-core facts that show why GPU training speed depends on more than raw core count.","www.thundercompute.com","https:\u002F\u002Fwww.thundercompute.com\u002Fblog\u002Fcuda-cores-explained-ai-training",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781110970740-rly0.png","industry","en","15c682f7-7da1-4ecc-abb8-adec3192f9e4",[17,18,19,20,21,22],"CUDA cores","Tensor Cores","NVIDIA GPUs","AI training","GPU memory bandwidth","Thunder Compute",[24,25,26],"CUDA cores handle general parallel math, but they are only one part of GPU speed.","Tensor Cores usually matter more for AI training than raw CUDA core count.","Memory bandwidth, VRAM, and architecture can outweigh a higher core total.",0,"2026-06-10T17:02:25.255458+00:00","2026-06-10T17:02:25.248+00:00","a1c158f8-b98b-4d99-aa84-35523d1f1876",{"tags":32,"relatedLang":44,"relatedPosts":48},[33,35,37,39,42],{"name":21,"slug":34},"gpu-memory-bandwidth",{"name":20,"slug":36},"ai-training",{"name":17,"slug":38},"cuda-cores",{"name":40,"slug":41},"Nvidia GPUs","nvidia-gpus",{"name":18,"slug":43},"tensor-cores",{"id":15,"slug":45,"title":46,"language":47},"cuda-cores-memory-tensor-cores-win-zh","CUDA 核心重要，但記憶體與 Tensor Core 才決定訓練速度","zh",[49,55,61,67,73,79],{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":13},"24ebb482-f6c2-405c-967e-61549b265310","manus-series-b-competitors-profile-en","Manus Raises Series B and Faces Box, Airtable","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781113677369-vi69.png","2026-06-10T17:47:29.876234+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":13},"016d4913-2f23-417d-b0a7-b45610852b8d","reid-hoffman-exit-microsoft-board-right-move-en","Reid Hoffman’s exit from Microsoft’s board is the right move","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781112768133-whcv.png","2026-06-10T17:32:20.518936+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":13},"547493e6-e515-4aa3-bdcd-66eba6cf1fd5","codex-0-139-0-web-search-tooling-en","Codex 0.139.0 adds web search and cleaner tooling","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781104674116-naze.png","2026-06-10T15:17:22.112795+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":13},"cf1e4743-3203-4145-97b5-41be640b5547","docker-github-org-container-work-en","Docker’s GitHub org shows where container work happens","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781101078934-w2yp.png","2026-06-10T14:17:22.825207+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":13},"be0d131f-9772-4b0e-8931-c9ede1d8ce55","cursor-mac-update-stuck-old-version-en","Cursor on Mac can get stuck on old versions","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781093876357-245p.png","2026-06-10T12:17:20.258974+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":13},"8b7626e4-c384-4703-b268-61e5626a4236","openai-ipo-wall-street-ai-test-en","OpenAI’s IPO will expose AI hype to Wall Street","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781067770153-r09j.png","2026-06-10T05:02:22.694946+00:00",[86,91,96,101,106,111,116,121,126,131],{"id":87,"slug":88,"title":89,"created_at":90},"d35a1bd9-e709-412e-a2df-392df1dc572a","ai-impact-2026-developments-market-en","AI's Impact in 2026: Key Developments and Market Shifts","2026-03-25T16:20:33.205823+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"5ed27921-5fd6-492e-8c59-78393bf37710","trumps-ai-legislative-framework-en","Trump's AI Legislative Framework: What's Inside?","2026-03-25T16:22:20.005325+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"e454a642-f03c-4794-b185-5f651aebbaca","nvidia-gtc-2026-key-highlights-innovations-en","NVIDIA GTC 2026: Key Highlights and Innovations","2026-03-25T16:22:47.882615+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"0ebb5b16-774a-4922-945d-5f2ce1df5a6d","claude-usage-diversifies-learning-curves-en","Claude Usage Diversifies, Learning Curves Emerge","2026-03-25T16:25:50.770376+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"69934e86-2fc5-4280-8223-7b917a48ace8","openclaw-ai-commoditization-concerns-en","OpenClaw's Rise Raises Concerns of AI Model Commoditization","2026-03-25T16:26:30.582047+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"b4b2575b-2ac8-46b2-b90e-ab1d7c060797","google-gemini-ai-rollout-2026-en","Google's Gemini AI Rollout Extended to 2026","2026-03-25T16:28:14.808842+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"6e18bc65-42ae-4ad0-b564-67d7f66b979e","meta-llama4-fabricated-results-scandal-en","Meta's Llama 4 Scandal: Fabricated AI Test Results Unveiled","2026-03-25T16:29:15.482836+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"bf888e9d-08be-4f47-996c-7b24b5ab3500","accenture-mistral-ai-deployment-en","Accenture and Mistral AI Team Up for AI Deployment","2026-03-25T16:31:01.894655+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"5382b536-fad2-49c6-ac85-9eb2bae49f35","mistral-ai-high-stakes-2026-en","Mistral AI: Facing High Stakes in 2026","2026-03-25T16:31:39.941974+00:00",{"id":132,"slug":133,"title":134,"created_at":135},"9da3d2d6-b669-4971-ba1d-17fdb3548ed5","cursors-meteoric-rise-pressures-en","Cursor's Meteoric Rise Faces Industry Pressures","2026-03-25T16:32:21.899217+00:00"]