[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-bigquery-vectorized-python-udfs-arrow-zh":3,"article-related-bigquery-vectorized-python-udfs-arrow-zh":30,"series-tools-4860bd59-d197-4c32-a4aa-e3f53aa08d7a":73},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"4860bd59-d197-4c32-a4aa-e3f53aa08d7a","bigquery-vectorized-python-udfs-arrow-zh","BigQuery Arrow 向量化 Python UDF 實作","\u003Cp data-speakable=\"summary\">這篇教你在 BigQuery 啟用 Arrow RecordBatch 向量化 Python UDF，完成連線、建立函式、SQL 呼叫與批次驗證。\u003C\u002Fp>\u003Cp>這篇給 BigQuery 開發者看，特別是想把 Python UDF 從逐列處理改成批次處理的人。照著做完，你會拿到一個可執行的 Arrow RecordBatch UDF、可\u003Ca href=\"\u002Fnews\u002Fapple-intelligence-ai-everyday-experiences-zh\">直接\u003C\u002Fa>呼叫的 SQL 查詢，以及一套確認它真的在批次模式運作的方法。\u003C\u002Fp>\u003Cp>內容會以 BigQuery release notes 與 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgoogleapis\u002Fpython-bigquery\">python-bigquery GitHub repo\u003C\u002Fa> 的新 UDF 路徑為主，目標是讓你把資料處理成本降到更適合批次工作的方式。\u003C\u002Fp>\u003Ch2>開始之前\u003C\u002Fh2>\u003Cul>\u003Cli>Google Cloud 專案，且已啟用 BigQuery\u003C\u002Fli>\u003Cli>專案已開啟 Billing\u003C\u002Fli>\u003Cli>可使用 BigQuery Studio，或具備執行 SQL jobs 的權限\u003C\u002Fli>\u003Cli>BigQuery Python UDF 的 Cloud resource connection\u003C\u002Fli>\u003Cli>Python 3.10+\u003C\u002Fli>\u003Cli>Apache Arrow 14+\u003C\u002Fli>\u003Cli>Google Cloud CLI 470+\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: 確認向量化 UDF 功能\u003C\u002Fh2>\u003Cp>目的：先確認你對應的是 BigQuery 已公開的 Python UDF 路徑，避免在舊版行為上白做工。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782027159471-91jd.png\" alt=\"BigQuery Arrow 向量化 Python UDF 實作\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>打開 \u003Ca href=\"https:\u002F\u002Fdocs.cloud.google.com\u002Fbigquery\u002Fdocs\u002Frelease-notes\">BigQuery release notes\u003C\u002Fa>，找到 Python UDF 一般可用的公告，並確認新路徑是透過 Apache Arrow RecordBatch 來接收批次資料。\u003C\u002Fp>\u003Cp>驗收：你應該看到 Python UDF GA 的更新說明，並且能打開 BigQuery 查詢編輯器，代表專案與權限都可用。\u003C\u002Fp>\u003Ch2>Step 2: 建立 Cloud resource connection\u003C\u002Fh2>\u003Cp>目的：替 BigQuery Python UDF 準備安全的執行連線，讓函式能在正確區域與權限下運作。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782027163401-jiv2.png\" alt=\"BigQuery Arrow 向量化 Python UDF 實作\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>在 BigQuery 建立或重用一個 Cloud resource connection，位置要和 dataset 相同。接著把這個 connection 對應的 service account 授予 UDF 執行所需的 IAM 權限。\u003C\u002Fp>\u003Cpre>\u003Ccode>-- 範例：請用 BigQuery Console 或 gcloud 依你的區域建立 connection\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>驗收：你應該看到 connection 已出現在 BigQuery 中，且其 service account 已具備預期角色。\u003C\u002Fp>\u003Ch2>Step 3: 撰寫 Arrow RecordBatch UDF\u003C\u002Fh2>\u003Cp>目的：建立真正吃批次輸入的 Python UDF，這是向量化效能的核心。\u003C\u002Fp>\u003Cp>先寫一個最小版本，讓 Python 函式接收 Apache Arrow RecordBatch，做簡單轉換，再回傳 BigQuery 可接受的結果格式。先不要加第三方套件，先把資料流驗證通。\u003C\u002Fp>\u003Cpre>\u003Ccode>CREATE OR REPLACE FUNCTION `my_dataset.normalize_text_batch`(input STRING)\nRETURNS STRING\nLANGUAGE PYTHON\nOPTIONS (\n  runtime_version = 'python-3.11',\n  entry_point = 'normalize_text_batch',\n  packages = ['pyarrow']\n)\nAS r'''\nimport pyarrow as pa\n\ndef normalize_text_batch(batch):\n    # Batch-oriented logic goes here\n    return batch\n''';\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>驗收：你應該看到函式成功儲存，且 dataset 裡已出現這個 UDF，編輯器也接受 runtime 與 package 設定。\u003C\u002Fp>\u003Ch2>Step 4: 用 SQL 呼叫 UDF\u003C\u002Fh2>\u003Cp>目的：把 UDF 放進\u003Ca href=\"\u002Fnews\u002Fai-agents-software-finance-risk-zh\">真實\u003C\u002Fa>查詢中，確認 BigQuery 會在查詢執行時呼叫批次 Python \u003Ca href=\"\u002Fnews\u002Fai-code-review-rollout-with-human-oversight-zh\">程式\u003C\u002Fa>。\u003C\u002Fp>\u003Cp>先對小型 sample table 做 SELECT，再擴大到正式資料集。這樣可以先驗證輸出，再觀察批次路徑是否正常。\u003C\u002Fp>\u003Cpre>\u003Ccode>SELECT\n  my_dataset.normalize_text_batch(col) AS normalized_value\nFROM my_dataset.sample_table\nLIMIT 100;\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>驗收：你應該看到查詢完成，結果格內出現轉換後的值，且 job details 裡沒有 Python runtime error。\u003C\u002Fp>\u003Ch2>Step 5: 驗證批次行為與效能\u003C\u002Fh2>\u003Cp>目的：確認這個 UDF 不只是能跑，而是真的用批次方式降低每列開銷。\u003C\u002Fp>\u003Cp>把這次查詢的 job details 跟逐列版 baseline 比較，觀察 Python 呼叫次數、每列開銷與總耗時。若你的邏輯偏 CPU 密集，請用更大的樣本，批次優勢會更明顯。\u003C\u002Fp>\u003Cp>驗收：你應該看到相同的輸出結果，但在相同資料量下，批次版的執行特徵更好。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>指標\u003C\u002Fth>\u003Cth>基準／優化前\u003C\u002Fth>\u003Cth>結果／優化後\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Python 執行方式\u003C\u002Ftd>\u003Ctd>逐列 UDF\u003C\u002Ftd>\u003Ctd>Arrow RecordBatch 向量化 UDF\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>資料傳輸單位\u003C\u002Ftd>\u003Ctd>單列\u003C\u002Ftd>\u003Ctd>批次\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>每列額外開銷\u003C\u002Ftd>\u003Ctd>較高\u003C\u002Ftd>\u003Ctd>較低\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>適合觀察的資料量\u003C\u002Ftd>\u003Ctd>小樣本不明顯\u003C\u002Ftd>\u003Ctd>較大樣本更容易看出差異\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>常見錯誤\u003C\u002Fh2>\u003Cul>\u003Cli>connection、dataset、UDF 不在同一區域。修法：把三者放到同一 location，再重新部署。\u003C\u002Fli>\u003Cli>Python runtime 與套件版本不相容。修法：先固定 runtime_version，再挑支援該版本的 pyarrow 與其他套件。\u003C\u002Fli>\u003Cli>拿很小的查詢測速度。修法：改用較大的表或重複多次的工作負載，才看得到批次優勢。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>接下來可以看什麼\u003C\u002Fh2>\u003Cp>下一步可以把這個 UDF 擴充成可處理第三方套件的版本，加入錯誤處理，並和 SQL 原生轉換比較，找出每種工作負載最適合的做法。\u003C\u002Fp>","這篇教你在 BigQuery 啟用 Arrow RecordBatch 向量化 Python UDF，完成連線、建立函式、SQL 呼叫與批次驗證。","docs.cloud.google.com","https:\u002F\u002Fdocs.cloud.google.com\u002Fbigquery\u002Fdocs\u002Frelease-notes",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782027159471-91jd.png","tools","zh","a0d9f17c-ff77-49c2-bf56-d78cebffc801",[17,18,19,20,21],"BigQuery","Python UDF","Apache Arrow","RecordBatch","Google Cloud CLI",[23,24,25],"先確認 BigQuery release notes 與區域設定，再建立可用的 Cloud resource connection。","UDF 要用 Arrow RecordBatch 批次輸入，才能驗證向量化路徑。","用 SQL 小樣本先驗證正確性，再用 job details 比較批次版與逐列版的執行特徵。",0,"2026-06-21T07:32:19.997774+00:00","2026-06-21T07:32:19.987+00:00","2280f033-e3ad-4cc4-8f0e-10a6d08600f5",{"tags":31,"relatedLang":32,"relatedPosts":36},[],{"id":15,"slug":33,"title":34,"language":35},"bigquery-vectorized-python-udfs-arrow-en","BigQuery vectorized Python UDFs with Arrow","en",[37,43,49,55,61,67],{"id":38,"slug":39,"title":40,"cover_image":41,"image_url":41,"created_at":42,"category":13},"642eb00a-f3cd-422f-9d29-e113dc82e5d3","apples-gemini-powered-siri-seo-stakes-zh","Apple Siri 接上 Gemini，SEO 壓力升高","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782011869139-s32t.png","2026-06-21T03:17:28.557729+00:00",{"id":44,"slug":45,"title":46,"cover_image":47,"image_url":47,"created_at":48,"category":13},"03ca3c65-1597-4a56-aaf5-09949ec0b995","databricks-custom-model-serving-endpoints-zh","Databricks 端點讓你少猜","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782003811403-1hhn.png","2026-06-21T01:03:09.88432+00:00",{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":13},"1210fbca-7d00-4a0d-9be2-39163079dbb0","go-turns-team-chaos-into-boring-builds-zh","Go 把團隊混亂變成穩定建置","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781971401098-zdl4.png","2026-06-20T16:02:53.571588+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":13},"44069991-6152-4495-879f-c4e727541300","fde-sales-engineering-playbook-zh","FDE把售前和工程拧成一股绳","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781965996042-v9rb.png","2026-06-20T14:32:50.484812+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":13},"7beaabe3-5421-4e2b-a42a-d1a7b669be12","deploy-minimax-m3-with-vllm-openai-api-zh","用 vLLM 部署 MiniMax M3 並開啟 OpenAI API","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781954276176-k5fw.png","2026-06-20T11:17:30.019598+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":13},"fe9fecba-d6ae-4293-af38-e68e6c2c111b","namastack-turns-outbox-pain-into-reliable-events-zh","Namastack 把 outbox 變穩定事件流","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781949794069-sfg2.png","2026-06-20T10:02:49.479466+00:00",[74,79,84,89,94,99,104,109,114,119],{"id":75,"slug":76,"title":77,"created_at":78},"855cd52f-6fab-46cc-a7c1-42195e8a0de4","surepath-real-time-mcp-policy-controls-zh","SurePath 推出即時 MCP 政策控管","2026-03-26T07:57:40.77233+00:00",{"id":80,"slug":81,"title":82,"created_at":83},"9b19ab54-edef-4dbd-9ce4-a51e4bae4ebb","mcp-in-2026-the-ai-tool-layer-teams-use-zh","2026 年 MCP：團隊真的在用的 AI 工具層","2026-03-26T08:01:46.589694+00:00",{"id":85,"slug":86,"title":87,"created_at":88},"af9c46c3-7a28-410b-9f04-32b3de30a68c","prompting-in-2026-what-actually-works-zh","2026 提示工程，真正有用的是什麼","2026-03-26T08:08:12.453028+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"05553086-6ed0-4758-81fd-6cab24b575e0","garry-tan-open-sources-claude-code-toolkit-zh","Garry Tan 開源 Claude Code 工具包","2026-03-26T08:26:20.068737+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"042a73a2-18a2-433d-9e8f-9802b9559aac","github-ai-projects-to-watch-in-2026-zh","2026 必看 20 個 GitHub AI 專案","2026-03-26T08:28:09.619964+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"a5f94120-ac0d-4483-9a8b-63590071ac6a","claude-code-vs-cursor-2026-zh","Claude Code 與 Cursor 深度對比：202…","2026-03-26T13:27:14.279193+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"0975afa1-e0c7-4130-a20d-d890eaed995e","practical-github-guide-learning-ml-2026-zh","2026 機器學習入門 GitHub 實用指南","2026-03-27T01:16:49.712576+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"bfdb467a-290f-4a80-b3a9-6f081afb6dff","aiml-2026-student-ai-ml-lab-repo-review-zh","AIML-2026：像課綱的學生實驗 Repo","2026-03-27T01:21:51.467798+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"80cabc3e-09fc-4ff5-8f07-b8d68f5ae545","ai-trending-github-repos-and-research-feeds-zh","AI Trending：把 AI 資源收成一張表","2026-03-27T01:31:35.262183+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"3ce6e6e2-bac5-463e-9f8d-45caabcc61f7","awesome-ai-for-science-research-tools-map-zh","AI 科研工具清單，開始像地圖了","2026-03-27T01:46:50.521945+00:00"]