[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-cuda-tile-basic-nvidia-april-fools-post-zh":3,"article-related-cuda-tile-basic-nvidia-april-fools-post-zh":29,"series-tools-a5f71507-4d4c-434f-834b-5fbe0405a5d9":88},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":11,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":11},"a5f71507-4d4c-434f-834b-5fbe0405a5d9","cuda-tile-basic-nvidia-april-fools-post-zh","NVIDIA 把 CUDA Tile 搬進 BASIC","\u003Cp>NVIDIA 在 2026 年 4 月 1 日丟出一篇很會玩的文。主角是 \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fcuda-tile-programming-now-available-for-basic\u002F\" target=\"_blank\" rel=\"noopener\">cuTile BASIC\u003C\u002Fa>。它把 \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-toolkit\" target=\"_blank\" rel=\"noopener\">CUDA 13.1\u003C\u002Fa> 的 tile-based GPU 編程，包進 BASIC 外皮。\u003C\u002Fp>\u003Cp>這梗很鬧，但不是純搞笑。文章裡的範例真的在講 tile、MMA、資料分塊。講白了，NVIDIA 想證明一件事：GPU 程式不必永遠綁死在 CUDA C++。\u003C\u002Fp>\u003Cp>如果你寫過 kernel，就知道痛點在哪。你要管 thread、block、\u003Ca href=\"\u002Fnews\u002Fanthropic-xero-ai-small-business-finance-zh\">lau\u003C\u002Fa>nch config，還要顧 memory access。cuTile BASIC 的意思很直接：把心力移到資料切塊，剩下交給編譯器。\u003C\u002Fp>\u003Ch2>這篇文章到底在秀什麼\u003C\u002Fh2>\u003Cp>核心其實是 \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fnvidia-cuda-13-1-powers-next-gen-gpu-programming-with-nvidia-cuda-tile-and-performance-gains\u002F\" target=\"_blank\" rel=\"noopener\">CUDA Tile\u003C\u002Fa>。它是 CUDA 13.1 裡的 tile-based programming model。開發者先描述資料怎麼切成 tile，再描述 tile 上要做什麼。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775142782708-1euw.png\" alt=\"NVIDIA 把 CUDA Tile 搬進 BASIC\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這種寫法很適合 GPU。因為很多工作本來就是矩陣、向量、分塊運算。你不用每次都手動算 thread index。你只要講清楚資料區塊，後面讓工具處理。\u003C\u002Fp>\u003Cp>文章拿 BASIC 來示範，也不是亂選。BASIC 是很多人第一個學的語言。它有行號，也夠老派。NVIDIA 故意用它，來凸顯 tile IR 的語言無關性。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-toolkit-release-notes\u002Findex.html\" target=\"_blank\" rel=\"noopener\">CUDA Toolkit 13.1\u003C\u002Fa> 是基礎版本。\u003C\u002Fli>\u003Cli>文章提到的 GPU 需要 compute capability 8.x 到 12.x。\u003C\u002Fli>\u003Cli>驅動需求是 \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-toolkit-release-notes\u002Findex.html\" target=\"_blank\" rel=\"noopener\">R580\u003C\u002Fa> 以上。\u003C\u002Fli>\u003Cli>Python 3.10 也在安裝清單裡。\u003C\u002Fli>\u003Cli>套件是透過 \u003Ca href=\"https:\u002F\u002Fpip.pypa.io\u002Fen\u002Fstable\u002F\" target=\"_blank\" rel=\"noopener\">pip\u003C\u002Fa> 裝進去的。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>BASIC 只是笑點，技術也是真的\u003C\u002Fh2>\u003Cp>文章最妙的地方，是它把懷舊梗和實作混在一起。你會看到 line number、老電腦、甚至像在寫學校作業的語氣。然後下一秒，它就拿出一段 vector add 程式。\u003C\u002Fp>\u003Cp>那段程式很短。它用 \u003Ccode>TILE\u003C\u002Fcode> 和 \u003Ccode>BID\u003C\u002Fcode> 表達資料分塊。你不用手算每個 thread 的位置。這對看慣 CUDA C++ 的人來說，衝擊很大。\u003C\u002Fp>\u003Cp>更有意思的是，這不是單純語法糖。它背後是 tile IR。也就是說，BASIC 只是前端之一。真正重要的是中間層可以接很多語言。\u003C\u002Fp>\u003Cblockquote>“CUDA Tile, introduced in CUDA 13.1, enables flexible tile-based GPU programming from any language.” — NVIDIA Technical Blog\u003C\u002Fblockquote>\u003Cp>矩陣乘法的例子更有感。文章用了 \u003Ccode>MMA\u003C\u002Fcode>，還寫出像 \u003Ccode>A(128, 32)\u003C\u002Fcode>、\u003Ccode>B(32, 128)\u003C\u002Fcode>、\u003Ccode>C(128, 128)\u003C\u002Fcode> 這種 tile 尺寸。這些數字不是裝飾。這就是 GPU 最常見的思考方式。\u003C\u002Fp>\u003Cp>說真的，這種示範很聰明。因為它讓人一眼看懂。你不用先讀 200 行 kernel，才知道資料怎麼跑。對教學、原型、舊系統改造都很有用。\u003C\u002Fp>\u003Ch2>跟傳統 CUDA 比，差在哪\u003C\u002Fh2>\u003Cp>傳統 CUDA C++ 很強。這點沒人會嘴。你可以精準控制 thread mapping、shared memory、warp 行為。代價就是語法很吵，心智負擔也高。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775142776921-rdml.png\" alt=\"NVIDIA 把 CUDA Tile 搬進 BASIC\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>cuTile BASIC 的路線完全不同。它把重點放在 tile。你先定義資料區塊，再定義區塊上的運算。編譯器去處理很多底層細節。\u003C\u002Fp>\u003Cp>這種抽象有好處，也有代價。好處是可讀性高。代價是你少了一些手動調校空間。要榨乾最後幾個百分點效能，還是得回到更底層的工具。\u003C\u002Fp>\u003Cul>\u003Cli>CUDA C++：你要自己管 \u003Ccode>threadIdx.x\u003C\u002Fcode> 和 \u003Ccode>blockIdx.x\u003C\u002Fcode>。\u003C\u002Fli>\u003Cli>cuTile BASIC：你直接對 tile 做運算。\u003C\u002Fli>\u003Cli>CUDA C++：launch geometry 要自己算。\u003C\u002Fli>\u003Cli>cuTile BASIC：很多配置交給 compiler 和 runtime。\u003C\u002Fli>\u003Cli>CUDA C++：適合極致調校。\u003C\u002Fli>\u003Cli>cuTile BASIC：適合教學、移植、快速驗證。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>文章裡的測試數字也不是空話。vector add 跑 1,024 個元素。GEMM 則是 512x512 矩陣。結果還會檢查誤差，像 max differ\u003Ca href=\"\u002Fnews\u002Fopenai-content-filtering-labeling-factory-zh\">en\u003C\u002Fa>ce 0.000012 這種值，代表它不是只做表面功夫。\u003C\u002Fp>\u003Cp>我覺得這裡最重要的訊號，是 NVIDIA 在推一個共享後端。前端可以很多種。BASIC、P\u003Ca href=\"\u002Fnews\u002Fbytedance-deerflow-2-0-47k-stars-zh\">yt\u003C\u002Fa>hon、Julia、甚至別的 DSL，都有機會接上去。這比單一語言工具鏈更有彈性。\u003C\u002Fp>\u003Ch2>這跟其他方案比，位置在哪\u003C\u002Fh2>\u003Cp>如果拿來跟一般 GPU 生態比，cuTile BASIC 很像一種介於教學與正式工具之間的東西。它不像 \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-c-programming-guide\u002F\" target=\"_blank\" rel=\"noopener\">CUDA C Programming Guide\u003C\u002Fa> 那麼底層，也不像高階框架那麼黑盒。\u003C\u002Fp>\u003Cp>對比 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnvidia\u002Fcuda-tile\" target=\"_blank\" rel=\"noopener\">NVIDIA 的 GitHub 範例\u003C\u002Fa>，你可以看出方向很清楚。NVIDIA 想把 tile 當成共通語言。前端可以換，資料切塊的邏輯不換。\u003C\u002Fp>\u003Cp>這也讓人想到其他語言的 GPU 路線。像 \u003Ca href=\"https:\u002F\u002Fwww.julialang.org\u002F\" target=\"_blank\" rel=\"noopener\">Julia\u003C\u002Fa> 社群就很愛這種高階表達方式。OraCore.dev 之前也寫過 \u003Ca href=\"\u002Fnews\u002Fcutile-jl-brings-nvidia-cuda-tile-based-programming-to-julia\">cuTile.jl\u003C\u002Fa>。那篇和這次 BASIC 的邏輯很像。\u003C\u002Fp>\u003Cul>\u003Cli>傳統 CUDA：控制力最強。\u003C\u002Fli>\u003Cli>cuTile BASIC：語法最短，讀起來最直白。\u003C\u002Fli>\u003Cli>Julia 方案：適合研究和數值運算。\u003C\u002Fli>\u003Cli>Python 方案：適合資料科學團隊。\u003C\u002Fli>\u003Cli>DSL 路線：適合特定領域工作負載。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>如果你問我，這種設計最有價值的地方，不是 BASIC 本身。是它證明 tile backend 可以吃下奇怪前端。這代表未來很多舊語言，也可能找到 GPU 出口。\u003C\u002Fp>\u003Cp>這對企業很實際。很多公司還留著老系統。不是每個團隊都能把 Fortran、BASIC 或自家 DSL 全部重寫。能接上 GPU，才是重點。\u003C\u002Fp>\u003Ch2>這件事放回產業脈絡看\u003C\u002Fh2>\u003Cp>GPU 編程這幾年一直在分層。底層是 CUDA、PTX、driver。上層則是各種框架、DSL、編譯器。大家都想少碰硬體細節。\u003C\u002Fp>\u003Cp>這不是偷懶。是成本問題。開發者時間很貴。能少寫 300 行樣板碼，就少掉很多 bug。尤其是矩陣運算、推論、資料搬移這類工作。\u003C\u002Fp>\u003Cp>所以 tile-based 編程很合理。它把運算單位從 thread 拉回資料。這跟現代 AI 和 HPC 的工作型態很合。很多模型本來就是大塊矩陣在跑。\u003C\u002Fp>\u003Cp>我覺得 NVIDIA 這篇 April Fools 文，其實是在測風向。它一邊玩笑，一邊告訴大家：tile IR 不是玩具。它要變成平台。\u003C\u002Fp>\u003Cp>這種做法也有市場意義。當工具鏈夠彈性，生態就比較容易長。開發者不一定愛 BASIC，但會在意「我能不能用熟悉的語言碰 GPU」。這才是重點。\u003C\u002Fp>\u003Ch2>我怎麼看這個梗\u003C\u002Fh2>\u003Cp>老實說，這篇很會寫。它的笑點夠老派，技術點也夠硬。不是那種只會丟梗圖的行銷文。它真的有 demo，也真的有數字。\u003C\u002Fp>\u003Cp>如果你是 GPU 開發者，這篇值得看。不是因為 BASIC 很酷。是因為它提醒你，編譯器和 IR 可能比語言本身更重要。\u003C\u002Fp>\u003Cp>接下來我會盯兩件事。第一，還會有哪些語言接上 tile backend。第二，這套模型在真實工作負載上，能不能少掉更多 boilerplate，又不犧牲太多效能。\u003C\u002Fp>\u003Cp>講白了，這次的重點不是 BASIC。是 NVIDIA 在告訴大家：GPU 程式可以更像在描述資料，而不是在手刻座標。你如果還在維護老程式，現在就該想想，哪個模組最適合先試 tile 化。\u003C\u002Fp>","NVIDIA 的 4 月 1 日文章把 CUDA Tile 接到 BASIC，拿 70 年代語言示範現代 GPU tile 編程。笑點很多，但背後的編譯器設計很認真。","developer.nvidia.com","https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fcuda-tile-programming-now-available-for-basic\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775142782708-1euw.png","tools","zh","5eeb9239-a844-49ff-9727-b76676dc8447",[17,18,19,20,21,22,23,24,25],"NVIDIA","CUDA Tile","cuTile BASIC","GPU 編程","BASIC","tile-based programming","CUDA 13.1","矩陣乘法","GPU kernel",5,"2026-04-02T15:12:38.50232+00:00","2026-04-02T15:12:38.286+00:00",{"tags":30,"relatedLang":47,"relatedPosts":51},[31,33,35,38,40,42,43,45],{"name":22,"slug":32},"tile-based-programming",{"name":25,"slug":34},"gpu-kernel",{"name":36,"slug":37},"Nvidia","nvidia",{"name":20,"slug":39},"gpu-編程",{"name":23,"slug":41},"cuda-131",{"name":24,"slug":24},{"name":18,"slug":44},"cuda-tile",{"name":21,"slug":46},"basic",{"id":15,"slug":48,"title":49,"language":50},"cuda-tile-basic-nvidia-april-fools-post-en","CUDA Tile Comes to BASIC in NVIDIA’s April Fools Post","en",[52,58,64,70,76,82],{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":13},"91822854-0010-478e-b70c-6a624d039703","cloudflare-turns-startup-traffic-into-a-moat-zh","Cloudflare 讓流量變護城河","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780590804649-xc2z.png","2026-06-04T16:32:50.96702+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":13},"6ea3977e-ea7f-4d71-9472-08b512f81593","ai-code-review-tools-catch-hard-bugs-zh","AI code review 讓你抓到硬 bug","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780582701702-jnoi.png","2026-06-04T14:17:50.313258+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":13},"0342ff17-feea-4e43-81ff-d12c43cc93c0","claude-partner-network-learning-path-launches-zh","Claude 合作夥伴課程上線","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780578178111-1za9.png","2026-06-04T13:02:27.319581+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":13},"1a92ac0a-75ea-4877-874d-4a309cd0085b","nvidia-research-gpu-template-zh","NVIDIA 研究頁把 GPU 資源變模板","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780567412863-e8oq.png","2026-06-04T10:02:58.043845+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":13},"3ead09ec-5656-4165-9bb0-f602add3c409","qdrant-filter-first-rag-design-decoded-zh","Qdrant 讓 RAG 先過濾再找相似","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780566519640-bdds.png","2026-06-04T09:47:59.450347+00:00",{"id":83,"slug":84,"title":85,"cover_image":86,"image_url":86,"created_at":87,"category":13},"7b5e6965-307e-4492-bf65-d922cd7818ad","anthropic-code-review-tool-ai-generated-code-zh","Anthropic 讓 AI 程式變可審","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780563813320-5wc7.png","2026-06-04T09:02:56.999212+00:00",[89,94,99,104,109,114,119,124,129,134],{"id":90,"slug":91,"title":92,"created_at":93},"855cd52f-6fab-46cc-a7c1-42195e8a0de4","surepath-real-time-mcp-policy-controls-zh","SurePath 推出即時 MCP 政策控管","2026-03-26T07:57:40.77233+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"9b19ab54-edef-4dbd-9ce4-a51e4bae4ebb","mcp-in-2026-the-ai-tool-layer-teams-use-zh","2026 年 MCP：團隊真的在用的 AI 工具層","2026-03-26T08:01:46.589694+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"af9c46c3-7a28-410b-9f04-32b3de30a68c","prompting-in-2026-what-actually-works-zh","2026 提示工程，真正有用的是什麼","2026-03-26T08:08:12.453028+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"05553086-6ed0-4758-81fd-6cab24b575e0","garry-tan-open-sources-claude-code-toolkit-zh","Garry Tan 開源 Claude Code 工具包","2026-03-26T08:26:20.068737+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"042a73a2-18a2-433d-9e8f-9802b9559aac","github-ai-projects-to-watch-in-2026-zh","2026 必看 20 個 GitHub AI 專案","2026-03-26T08:28:09.619964+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"a5f94120-ac0d-4483-9a8b-63590071ac6a","claude-code-vs-cursor-2026-zh","Claude Code 與 Cursor 深度對比：202…","2026-03-26T13:27:14.279193+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"0975afa1-e0c7-4130-a20d-d890eaed995e","practical-github-guide-learning-ml-2026-zh","2026 機器學習入門 GitHub 實用指南","2026-03-27T01:16:49.712576+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"bfdb467a-290f-4a80-b3a9-6f081afb6dff","aiml-2026-student-ai-ml-lab-repo-review-zh","AIML-2026：像課綱的學生實驗 Repo","2026-03-27T01:21:51.467798+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"80cabc3e-09fc-4ff5-8f07-b8d68f5ae545","ai-trending-github-repos-and-research-feeds-zh","AI Trending：把 AI 資源收成一張表","2026-03-27T01:31:35.262183+00:00",{"id":135,"slug":136,"title":137,"created_at":138},"3ce6e6e2-bac5-463e-9f8d-45caabcc61f7","awesome-ai-for-science-research-tools-map-zh","AI 科研工具清單，開始像地圖了","2026-03-27T01:46:50.521945+00:00"]