[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-wei-shen-me-lu-you-cai-shi-mo-xing-fu-wu-de-zhen-zheng-ping-zh":3,"article-related-wei-shen-me-lu-you-cai-shi-mo-xing-fu-wu-de-zhen-zheng-ping-zh":29,"series-industry-5b27896f-ad48-4a9a-8b6e-823568d8c669":82},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":11},"5b27896f-ad48-4a9a-8b6e-823568d8c669","wei-shen-me-lu-you-cai-shi-mo-xing-fu-wu-de-zhen-zheng-ping-zh","為什麼路由才是模型服務的真正瓶頸","\u003Cp data-speakable=\"summary\">模型服務的主要限制不是推理本身，而是路由決策；誰、何時、送到哪個模型與副本，才決定延遲、成本與穩定性。\u003C\u002Fp>\u003Cp>我認為，現代 model serving 最大的誤判，就是把 routing 當成基礎設施雜務。它不是。當流量放大後，請求送往哪裡，直接決定延遲、利用率、成本，甚至整個系統能不能在高峰期維持穩定。Netflix 把焦點放在 serving routing 上，方向是對的，因為今天的 serving stack 已經不只是把模型跑快，而是要為每個請求選對模型、選對副本、走對路徑。\u003C\u002Fp>\u003Ch2>第一個論點：路由先決定成本結構\u003C\u002Fh2>\u003Cp>最直接的原因很簡單：每一次錯誤分派，都是 inference 成本的直接浪費。請求如果被送到冷啟動的 replica、已經飽和的 \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa>，或是不適合當下任務的模型版本，錢在模型真正開始產生價值之前就先燒掉了。對高流量系統來說，這種小小的 routing 效率損失，會被放大成真實的基礎設施\u003Ca href=\"\u002Fnews\u002Fmicrosoft-80-billion-ai-capex-decade-zh\">支出\u003C\u002Fa>。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778278838578-nms7.png\" alt=\"為什麼路由才是模型服務的真正瓶頸\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這不是抽象推論。假設一個每日千萬級請求的系統，routing 只要讓 5% 流量落到次佳路徑，累積下來就可能是數百萬次額外排隊、重試或 fallback。像 Netflix 這類服務個人化體驗的公司，不可能只靠單純 round-robin 就收工；路由必須同時考慮模型可用性、流量型態與運維限制，因為送錯目的地造成的尾延遲與資源失衡，本質上就是成本問題。\u003C\u002Fp>\u003Ch2>第二個論點：路由已經是模型品質的一部分\u003C\u002Fh2>\u003Cp>第二個原因\u003Ca href=\"\u002Fnews\u002Fmatz-ai-ruby-native-compiler-matters-zh\">更重要\u003C\u002Fa>：routing 影響的不只是性能，還包括輸出品質。在現代 serving 系統裡，router 常常決定哪個專門模型接手、哪個版本被提升、或是何時切到 fallback。這代表路由策略本身就是產品體驗的一部分。router 弱，使用者感受到的就是一個較弱的系統，即使底層模型本身很強。\u003C\u002Fp>\u003Cp>這在 ens\u003Ca href=\"\u002Fnews\u002Fgemma-4-assistant-models-faster-draft-tokens-zh\">em\u003C\u002Fa>ble、canary、或依使用者分群選模的系統裡尤其明顯。對某個受眾表現極佳的推薦模型，換到另一個 segment 可能就完全失準。若 routing 能理解 request context，就能保住相關性；若只是粗暴分派，差異會被抹平，結果品質下降。換句話說，routing 不是獨立於 model intelligence 的層，而是讓 intelligence 真正在 production 裡顯現的機制之一。\u003C\u002Fp>\u003Ch2>反方可能怎麼說\u003C\u002Fh2>\u003Cp>最強的反對意見是：routing 很容易被做得過度複雜。很多團隊根本不需要精密的 placement logic、動態策略或多階段決策。若產品規模小、只有一個模型、一種硬體層級、流量也不大，單純的 load balancer 加上一個 deployment target 就夠了。此時花太多時間在 routing 上，確實是浪費。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778278854553-wejo.png\" alt=\"為什麼路由才是模型服務的真正瓶頸\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這個反對意見在小規模下是成立的。若 serving footprint 很小，router 就應該保持簡單。但這不是反對 routing 作為一門學科，而是反對過早複雜化。只要團隊開始有多模型、rollout 策略、硬體差異或流量整形需求，routing 就不再是可有可無的附屬品，而是保護可靠性與成本的核心機制。錯的不是太早做 routing，而是假裝自己永遠不會需要它。\u003C\u002Fp>\u003Ch2>你能做什麼\u003C\u002Fh2>\u003Cp>如果你是工程師或平台負責人，請把 routing 當成一個一級子系統，給它和模型訓練、部署同等的審視標準。要量 tail latency、replica saturation、fallback rate，以及每條 route 的成本；設計時要支援 policy 變更，而不是只做靜態 load balancing。若你是 PM 或創辦人，應該在模型需求之外同步定義 routing 需求，因為你選的 serving 策略，會同時改變使用者體驗與雲端帳單。先把 control plane 做好，別等規模逼你補課。\u003C\u002Fp>","模型服務的主要限制不是推理本身，而是路由決策；誰、何時、送到哪個模型與副本，才決定延遲、成本與穩定性。","netflixtechblog.com","https:\u002F\u002Fnetflixtechblog.com\u002Fstate-of-routing-in-model-serving-16e22fe18741?gi=a78a5e08192d",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778278838578-nms7.png","industry","zh","7ab95b03-7468-4fef-93bc-0f6f13e61b25",[17,18,19,20,21],"model serving","routing","inference cost","tail latency","control plane",[23,24,25],"路由不是配角，而是模型服務的主要瓶頸。","routing 同時影響成本、延遲與輸出品質。","小規模可簡單，但一旦多模型與多策略並存，路由就必須升級為核心能力。",3,"2026-05-08T22:20:22.020009+00:00","2026-05-08T22:20:21.957+00:00",{"tags":30,"relatedLang":41,"relatedPosts":45},[31,34,36,37,39],{"name":32,"slug":33},"Model Serving","model-serving",{"name":19,"slug":35},"inference-cost",{"name":18,"slug":18},{"name":21,"slug":38},"control-plane",{"name":20,"slug":40},"tail-latency",{"id":15,"slug":42,"title":43,"language":44},"why-routing-is-the-real-bottleneck-in-model-serving-en","Why routing is the real bottleneck in model serving","en",[46,52,58,64,70,76],{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":13},"0231f359-f786-4e6c-8104-d3fae443f98b","4-chipotle-promo-details-for-members-zh","4 個 Chipotle 會員活動重點","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780540375071-5xa3.png","2026-06-04T02:32:19.54736+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":13},"39e4c1b2-4a8d-4baf-86eb-f65d4f6c3624","why-chipotle-53000-burrito-stunt-smart-brand-marketing-zh","為什麼 Chipotle 的 53,000 捲餅活動是聰明的品牌行銷","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780538579630-nkln.png","2026-06-04T02:02:28.454411+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":13},"53955aa8-9120-41c1-b342-6ca40e24b6ee","apples-gemini-deal-turns-cloud-ai-into-local-ai-zh","Apple 把雲端 AI 拆成本機 AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780535908899-g9ua.png","2026-06-04T01:18:03.319604+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":13},"a1119341-06e2-47ed-95f0-192f89c277a7","sec-draft-plan-puts-crypto-rules-first-zh","SEC草案把加密規則排第一","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780534108464-yi2d.png","2026-06-04T00:48:00.749142+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":13},"87a8a5d1-7284-4c58-aa53-9f353d5a2800","why-jensen-huang-keynote-bigger-than-nvidia-zh","為什麼 Jensen Huang 的 keynote 比 Nvidia 更重要","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780530468418-zi6e.png","2026-06-03T23:47:22.014083+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":13},"b5d4728c-ee2a-4df6-93c2-42e814d51ea1","why-smci-rally-is-about-supply-not-just-ai-zh","為什麼 SMCI 的漲勢主要是供給故事，不只是 Agentic AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780529579886-q16r.png","2026-06-03T23:32:28.626882+00:00",[83,88,93,98,103,108,113,118,123,128],{"id":84,"slug":85,"title":86,"created_at":87},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]