[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-why-routing-belongs-at-the-center-of-model-serving-zh":3,"article-related-why-routing-belongs-at-the-center-of-model-serving-zh":26,"series-industry-54b3fd97-c8e6-4b92-b87b-40913f024775":78},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":11,"views":23,"created_at":24,"published_at":25,"topic_cluster_id":11},"54b3fd97-c8e6-4b92-b87b-40913f024775","why-routing-belongs-at-the-center-of-model-serving-zh","為什麼 routing 應該放在 model serving 的中心","\u003Cp data-speakable=\"summary\">Routing 應該是 model serving 的單一入口，因為它能加快\u003Ca href=\"\u002Fnews\u002Fai-models-2026-which-one-to-use-zh\">模型\u003C\u002Fa>迭代，也能把服務層變成產品能力的一部分。\u003C\u002Fp>\u003Cp>我主張把 routing 放在 model serving 的中心，而不是把它當成附屬功能。原因很直接：當\u003Ca href=\"\u002Fnews\u002Fhycop-modular-interpretable-pde-surrogates-zh\">模型\u003C\u002Fa>數量、實驗數量和產品線一起增加時，真正決定團隊速度的不是模型本身，而是流量如何被分配、切換與回收。\u003C\u002Fp>\u003Cp>Netflix 對內部 ML serving 平台的描述提供了很強的現實證據。它們把單一 \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa> 當成入口後，新版本迭代更快，也更容易支援全新的 ML 產品，這代表 routing 不只是傳輸請求，而是把變更吸收進平台的控制面。\u003C\u002Fp>\u003Ch2>第一個論點：routing 不是細節，而是迭代速度的控制面\u003C\u002Fh2>\u003Cp>只要團隊要做 rollout、rollback、canary 或 A\u002FB \u003Ca href=\"\u002Fnews\u002Faltera-fpga-ai-suite-spatial-compiler-edge-ai-zh\">te\u003C\u002Fa>st，routing 就會直接影響交付速度。若每個模型或每條產品線都有自己的 serving 路徑，工程師就得重複實作流量規則，版本一多，維護成本會快速上升。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777882252815-32q6.png\" alt=\"為什麼 routing 應該放在 model serving 的中心\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>以推薦、搜尋、個人化三條常見路徑為例，很多公司一開始都各自接不同的 serving endpoint。到了需要同時支援新模型與舊模型共存時，問題就不是算力，而是協調成本，因為每次改版都要跨多個服務、監控和回滾機制同步更新。\u003C\u002Fp>\u003Cp>中央 routing 的價值就在於把這些共通動作收斂成一個入口。當流量決策集中在同一層，模型作者就不必重新處理分流邏輯，PM 也能更清楚地定義實驗邊界，整個團隊的變更節奏會明顯變快。\u003C\u002Fp>\u003Ch2>第一個論點：routing 讓平台從「管線」變成「產品表面」\u003C\u002Fh2>\u003Cp>很多人把 model serving 想成後端基礎設施，但一旦模型數量超過幾個，serving 就會變成產品能力的一部分。Netflix 提到單一 API 讓新版本和新產品都更容易推出，這說明 routing 已經不只是把 request 送到某個模型，而是在定義產品如何被組裝。\u003C\u002Fp>\u003Cp>這種差異在規模化後尤其明顯。當一個平台同時服務多個團隊時，若沒有統一入口，每個團隊都會建立自己的規則、自己的例外處理、自己的監控方式，最後形成一張難以理解的網狀結構，任何改動都需要跨團隊協商。\u003C\u002Fp>\u003Cp>相反地，若 routing 是中心，平台就能提供一致的契約。這不只降低錯誤率，也讓新功能更容易被產品化，因為新的模型、策略或 ensemble 可以透過同一條路徑接入，而不是再開一個新的服務面。\u003C\u002Fp>\u003Ch2>第二個論點：單一入口不只更快，還能打開新的產品設計空間\u003C\u002Fh2>\u003Cp>routing 真正的戰略價值，不只是部署更安全，而是它能根據上下文做決策。像是使用者分群、裝置類型、地區、語言、實驗狀態，甚至風險等級，都可以成為選擇模型或策略的條件，這讓 serving 從靜態轉成動態。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777882247993-vqqx.png\" alt=\"為什麼 routing 應該放在 model serving 的中心\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這種能力會直接影響產品形態。舉例來說，同一個推薦請求可以在新用戶、老用戶、高價值用戶之間走不同模型，或者在低延遲場景下走簡化版本，在高精度場景下走更重的 ensemble。這些不是純工程問題，而是產品設計問題。\u003C\u002Fp>\u003Cp>從組織角度看，這也會改變團隊分工。當 routing 能承載 business logic、experiment logic 與 model selection logic，產品團隊就不必每次都等底層重構才能驗證新想法。平台因此不只是供應模型，而是放大了實驗密度與功能創新速度。\u003C\u002Fp>\u003Ch2>第二個論點：沒有中心 routing，平台會在規模下碎裂\u003C\u002Fh2>\u003Cp>分散式 serving 的短期好處是自由，長期代價是碎片化。當每個團隊都用自己的 endpoint 和自己的 traffic policy，最後會出現同一種請求在不同產品裡有不同語意，監控也難以對齊，出了問題更難追查。\u003C\u002Fp>\u003Cp>這種碎裂會直接拖慢組織效率。工程師花在對齊規則、修補差異、補寫轉接層的時間，往往比真正提升模型品質的時間更多。若一家公司每個月都在上新模型，這種重複成本會像利息一樣持續累積，最後反噬交付速度。\u003C\u002Fp>\u003Cp>因此，routing 居中不是為了集權，而是為了減少重複。把共通邏輯放回平台層，才能讓各產品線保留差異化，同時避免每條路徑都重新發明一次 serving 系統。\u003C\u002Fp>\u003Ch2>反方可能怎麼說：中心 routing 會變成瓶頸\u003C\u002Fh2>\u003Cp>最強的反對意見是，單一入口很容易變成單一瓶頸。若 routing 層被過度治理、過度抽象，或被某個平台團隊鎖死，其他團隊就會卡在排程、權限和流程上，整個系統看起來雖然整齊，實際上卻更慢。\u003C\u002Fp>\u003Cp>另一個合理擔憂是 failure domain。當所有流量都依賴同一層 routing，這一層一旦出錯，影響面會比單點模型服務更大。對高流量產品來說，這種集中化風險不能被輕描淡寫，尤其在多租戶或跨團隊環境裡更是如此。\u003C\u002Fp>\u003Cp>但這些問題不否定中心 routing，反而定義了它應該怎麼做。正確做法不是做一個萬能 monolith，而是把 routing 限定在流量控制與模型選擇這些共通職責上，同時保留清楚的逃生通道與可觀測性。中心化的目標是標準化，不是把所有事情都塞進同一個黑盒。\u003C\u002Fp>\u003Ch2>你能做什麼\u003C\u002Fh2>\u003Cp>如果你是工程師，先盤點哪些 serving 邏輯其實只是重複的流量規則，能收斂就不要再開新 endpoint。如果你是 PM，把 routing 視為產品架構的一部分，因為它會直接影響實驗速度、功能上線方式與回滾成本。如果你是創辦人，應該在模型平台還不複雜時就先建立單一入口，否則等到每個團隊都各自長出一套 serving 路徑，再重構只會更痛。\u003C\u002Fp>","Routing 應該是 model serving 的單一入口，因為它能加快模型迭代，也能把服務層變成產品能力的一部分。","netflixtechblog.com","https:\u002F\u002Fnetflixtechblog.com\u002Fstate-of-routing-in-model-serving-16e22fe18741?gi=a87006c83174",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777882252815-32q6.png","industry","zh","639da4b0-a393-409d-930c-2546ec8a63ab",[17,18,19,20,21,22],"model serving","routing","ML 平台","流量控制","A\u002FB 測試","平台架構",4,"2026-05-04T08:10:33.607394+00:00","2026-05-04T08:10:33.49+00:00",{"tags":27,"relatedLang":37,"relatedPosts":41},[28,31,33,35,36],{"name":29,"slug":30},"Model Serving","model-serving",{"name":21,"slug":32},"ab-測試",{"name":19,"slug":34},"ml-平台",{"name":18,"slug":18},{"name":20,"slug":20},{"id":15,"slug":38,"title":39,"language":40},"why-routing-belongs-at-the-center-of-model-serving-en","Why routing belongs at the center of model serving","en",[42,48,54,60,66,72],{"id":43,"slug":44,"title":45,"cover_image":46,"image_url":46,"created_at":47,"category":13},"79976a7e-665f-453e-9dbf-51057693d2ca","7-bell-media-shows-to-watch-in-2026-zh","7 個 Bell Media 2026 必看節目","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780692485484-t21g.png","2026-06-05T20:47:34.111999+00:00",{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"4bd978e9-e607-43e7-b963-139e188062c6","5-things-to-know-about-the-littlest-hobo-remake-zh","5 個關於《The Littlest Hobo》重拍的重點","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780691576397-li25.png","2026-06-05T20:32:24.158702+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"afbb6e4b-c594-43fc-8662-0b6a52d2c40d","why-seth-rogen-career-proves-comedy-producer-game-zh","為什麼 Seth Rogen 的職涯證明喜劇已經是製作人的遊戲","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780690675993-pfpe.png","2026-06-05T20:17:22.576458+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"2bfcdc63-b43a-4e32-984e-27c2bb613927","seth-rogen-rebooting-the-littlest-hobo-zh","塞斯羅根重啟《The Littlest Hobo》","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780689796623-omce.png","2026-06-05T20:02:55.585503+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"f76e68d8-71a0-4ca8-810e-e84ce2bbc67a","why-anthropic-ipo-proves-ai-needs-wall-street-zh","為什麼 Anthropic 的 IPO 計畫證明 AI 仍需要華爾街","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780685271889-tdln.png","2026-06-05T18:47:25.278507+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"278f8342-b7a8-48a3-b124-1a8c161a9076","5-ways-microsoft-and-mayo-clinic-are-using-ai-zh","5 種 Microsoft 與 Mayo Clinic 的 AI 用法","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780681676255-rvye.png","2026-06-05T17:47:24.854111+00:00",[79,84,89,94,99,104,109,114,119,124],{"id":80,"slug":81,"title":82,"created_at":83},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":85,"slug":86,"title":87,"created_at":88},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]