[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-peft-llm-fine-tuning-without-full-retraining-zh":3,"article-related-peft-llm-fine-tuning-without-full-retraining-zh":31,"series-ai-agent-7315dc1e-d3c0-4888-8466-1328e8819be0":84},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"7315dc1e-d3c0-4888-8466-1328e8819be0","peft-llm-fine-tuning-without-full-retraining-zh","PEFT LoRA 微調 LLM 實作指南","\u003Cp data-speakable=\"summary\">這篇教你用 PEFT 和 LoRA 只\u003Ca href=\"\u002Fnews\u002Fllm-research-engineers-post-training-services-zh\">訓練\u003C\u002Fa>小型 adapter，完成 \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> 微調、保存與部署。\u003C\u002Fp>\u003Cp>這篇給想在不重訓整個 LLM 的前提下，快速做領域微調的開發者。照著做完，你會得到一個可訓練的 LoRA adapter、可重複使用的訓練流程，以及一份能確認只有少量參數在更新的檢查結果。\u003C\u002Fp>\u003Cp>你也會知道 PEFT 為什麼適合 production，adapter 和 full fine-tuning 的差別是什麼，還能把訓練出的 adapter 掛回同一個 base model 做推論。\u003C\u002Fp>\u003Ch2>開始之前\u003C\u002Fh2>\u003Cul>\u003Cli>Python 3.10+\u003C\u002Fli>\u003Cli>PyTorch 2.1+\u003C\u002Fli>\u003Cli>Hugging Face Transformers\u003C\u002Fli>\u003Cli>Hugging Face PEFT\u003C\u002Fli>\u003Cli>CUDA GPU，至少 16 GB VRAM，適合跑小型 LoRA 範例\u003C\u002Fli>\u003Cli>Hugging Face 帳號與 access token，用於下載 gated model\u003C\u002Fli>\u003Cli>一個 pretrained causal LLM，例如 Llama、Mistral 或較小的開源模型\u003C\u002Fli>\u003Cli>Git 與終端機，環境可在 macOS、Linux 或 Windows Subsystem for Linux\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: 選定基礎模型\u003C\u002Fh2>\u003Cp>這一步的產出是「凍結的 base model」。PEFT 的核心是保留既有語言能力，只對任務相關行為做小幅調整，所以你要先選一個已經懂語言的 pretrained model。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403475967-xlpz.png\" alt=\"PEFT LoRA 微調 LLM 實作指南\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>先用較小的\u003Ca href=\"\u002Ftag\u002F開源模型\">開源模型\u003C\u002Fa>驗證流程，再往更大的模型擴充。這樣你可以先確認資料載入、訓練與儲存都正常，再處理更高的顯存需求。\u003C\u002Fp>\u003Cpre>\u003Ccode>from transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_id = \"mistralai\u002FMistral-7B-v0.1\"\ntokenizer = AutoTokenizer.from_pretrained(model_id)\nmodel = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n\nfor param in model.parameters():\n    param.requires_grad = False\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>你應該看到模型成功載入，而且所有\u003Ca href=\"\u002Fnews\u002Fllm-wiki-compiler-raw-sources-to-wiki-zh\">原始\u003C\u002Fa>權重都被標記為 frozen。若你印出幾個參數，\u003Ccode>requires_grad\u003C\u002Fcode> 應該是 \u003Ccode>False\u003C\u002Fcode>。\u003C\u002Fp>\u003Ch2>Step 2: 掛上 LoRA adapter\u003C\u002Fh2>\u003Cp>這一步的產出是「可訓練的 LoRA 模組」。LoRA 會在指定投影層加入低秩矩陣，讓模型只更新很小一部分參數，就能學到新任務行為。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403474634-mv3r.png\" alt=\"PEFT LoRA 微調 LLM 實作指南\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>初學時先鎖定 attention projection，例如 \u003Ccode>q_proj\u003C\u002Fcode> 和 \u003Ccode>v_proj\u003C\u002Fcode>。這樣 adapter 夠小，訓練成本也低，通常已經能看到明顯效果。\u003C\u002Fp>\u003Cpre>\u003Ccode>from peft import LoraConfig, get_peft_model\n\nconfig = LoraConfig(\n    r=16,\n    lora_alpha=32,\n    lora_dropout=0.05,\n    target_modules=[\"q_proj\", \"v_proj\"],\n    task_type=\"CAUSAL_LM\",\n)\nmodel = get_peft_model(model, config)\nmodel.print_trainable_parameters()\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>你應該看到 trainable parameter 的報告，而且可訓練比例遠低於整個模型。健康的 LoRA 設定通常只會更新不到 1% 的參數。\u003C\u002Fp>\u003Ch2>Step 3: 整理任務資料\u003C\u002Fh2>\u003Cp>這一步的產出是「一致格式的訓練資料集」。PEFT 不能取代資料品質，它只是把需要更新的參數變少，所以資料仍然要能清楚對應你要的行為。\u003C\u002Fp>\u003Cp>把資料寫成固定模板，例如 prompt 和 ideal response。若你要做客服助理，就放客服問題與標準回覆；若你要做程式助手，就放指令與修正版輸出。\u003C\u002Fp>\u003Cp>格式要穩定，因為模型會從重複模式中學習。如果每幾筆資料的提示詞結構都不同，adapter 會更難收斂，評估也會變得模糊。\u003C\u002Fp>\u003Cp>你應該看到每筆樣本都能直接看出任務目標，而且輸入與輸出邊界清楚。只要人類一眼能判斷這筆資料在教什麼，資料集就算合格。\u003C\u002Fp>\u003Ch2>Step 4: 執行 adapter 訓練\u003C\u002Fh2>\u003Cp>這一步的產出是「只更新 LoRA 權重的訓練結果」。PEFT 的主要價值就在這裡：顯存壓力更低、可訓練參數更少、checkpoint 也更小。\u003C\u002Fp>\u003Cp>你可以用 Transformers 的 Trainer，或自己寫訓練迴圈。\u003Ca href=\"\u002Fnews\u002Fjensen-huang-lg-ai-cooperation-five-bets-zh\">重點\u003C\u002Fa>不是訓練框架，而是確認 base model 沒有被解凍，只有 adapter 在接收梯度。\u003C\u002Fp>\u003Cpre>\u003Ccode>from transformers import Trainer, TrainingArguments\n\nargs = TrainingArguments(\n    output_dir=\".\u002Fpeft-output\",\n    per_device_train_batch_size=2,\n    num_train_epochs=3,\n    learning_rate=2e-4,\n    fp16=True,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=train_dataset,\n)\ntrainer.train()\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>你應該看到 loss 逐步下降，而且 \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> 記憶體用量明顯低於 full fine-tuning。如果記憶體還是像整模型更新一樣暴增，先回頭檢查 base model 是否真的 frozen。\u003C\u002Fp>\u003Ch2>Step 5: 匯出並重載 adapter\u003C\u002Fh2>\u003Cp>這一步的產出是「可攜式 adapter 檔案」。這也是 PEFT 很適合 production 的原因之一，因為你通常只要保存小型 adapter，不必重新分發整個 base checkpoint。\u003C\u002Fp>\u003Cp>把 adapter 單獨存起來，之後在推論時再掛回同一個 base model。這樣你就能維持一份共同的基礎模型，同時切換多個不同任務的專用版本。\u003C\u002Fp>\u003Cpre>\u003Ccode>model.save_pretrained(\".\u002Fcustomer-support-lora\")\ntokenizer.save_pretrained(\".\u002Fcustomer-support-lora\")\n\n# Later\nfrom peft import PeftModel\nbase_model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\nadapted_model = PeftModel.from_pretrained(base_model, \".\u002Fcustomer-support-lora\")\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>你應該看到 adapter 很快載入，推論結果也開始帶有任務特徵。如果輸出還是很泛用，通常是 adapter 和 base model 版本不一致。\u003C\u002Fp>\u003Ch2>Step 6: 比較 PEFT 方法\u003C\u002Fh2>\u003Cp>這一步的產出是「方法選擇判斷」。LoRA 很常是第一選擇，但 adapter、prompt tuning、prefix tuning 和 IA³ 都是在不同限制下解同一類問題。\u003C\u002Fp>\u003Cp>如果你要一個品質與部署平衡都不錯的預設方案，先選 LoRA。若你想把更新量再壓低，可以考慮 prompt tuning 或 prefix tuning；若你偏好模組化架構，adapter 會更直觀。\u003C\u002Fp>\u003Cp>你現在應該能解釋 PEFT 為什麼有效：多數任務只需要調整模型行為，不需要重寫整個模型。這就是它能在有限算力下完成 LLM 微調的原因。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>指標\u003C\u002Fth>\u003Cth>基準／優化前\u003C\u002Fth>\u003Cth>結果／優化後\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>可訓練參數\u003C\u002Ftd>\u003Ctd>7B 全量微調\u003C\u002Ftd>\u003Ctd>約 5M 到 20M，LoRA 套在 7B 模型\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>可訓練參數\u003C\u002Ftd>\u003Ctd>13B 全量微調\u003C\u002Ftd>\u003Ctd>約 10M 到 40M，LoRA 套在 13B 模型\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>可訓練參數\u003C\u002Ftd>\u003Ctd>70B 全量微調\u003C\u002Ftd>\u003Ctd>約 50M 到 200M，LoRA 套在 70B 模型\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Adapter 大小\u003C\u002Ftd>\u003Ctd>完整模型 checkpoint\u003C\u002Ftd>\u003Ctd>常見 production 情境下約 50 MB 到 200 MB\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>常見錯誤\u003C\u002Fh2>\u003Cul>\u003Cli>忘記凍結 base model。修法：在訓練前檢查 pretrained weights 的 \u003Ccode>requires_grad=False\u003C\u002Fcode>。\u003C\u002Fli>\u003Cli>選錯 target modules。修法：先看模型結構，確認 attention projection 的實際名稱，再套用 LoRA。\u003C\u002Fli>\u003Cli>base 與 adapter 版本不一致。修法：adapter 一律掛回訓練時使用的同一個 base model family 與 revision。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>接下來可以看什麼\u003C\u002Fh2>\u003Cp>如果你已經能順利跑 LoRA，下一步可以看 adapter merging、QLoRA 的量化微調，以及把 adapter-only 模型和 full fine-tuning 在同一組任務上做評估比較。\u003C\u002Fp>","這篇教你用 PEFT 和 LoRA 只訓練小型 adapter，完成 LLM 微調、保存與部署。","dev.to","https:\u002F\u002Fdev.to\u002Fshrsv\u002Fpeft-explained-how-to-fine-tune-llms-without-retraining-billions-of-parameters-5cd9",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403475967-xlpz.png","ai-agent","zh","4d6fc0c2-481a-48c6-9743-2f3f77945134",[17,18,19,20,21,22],"PEFT","LoRA","Transformers","PyTorch","Hugging Face","LLM fine-tuning",[24,25,26],"PEFT 的核心是只訓練少量 adapter 參數，而不是重訓整個 LLM。","LoRA 是最常見的 PEFT 方法，適合先從 attention projection 入手。","訓練後要確認 base model 冻結、adapter 可訓練，並把 adapter 單獨保存再重載。",0,"2026-06-14T02:17:26.268208+00:00","2026-06-14T02:17:26.259+00:00","e3b68196-9e64-4c18-a3b6-a73e73bfb367",{"tags":32,"relatedLang":43,"relatedPosts":47},[33,35,37,39,41],{"name":18,"slug":34},"lora",{"name":21,"slug":36},"hugging-face",{"name":17,"slug":38},"peft",{"name":20,"slug":40},"pytorch",{"name":42,"slug":42},"transformers",{"id":15,"slug":44,"title":45,"language":46},"peft-llm-fine-tuning-without-full-retraining-en","PEFT for LLM Fine-Tuning Without Full Retraining","en",[48,54,60,66,72,78],{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"7ea0ef5b-d12c-4b18-b8fd-6ae3de67c296","coinbase-ai-agent-accounts-strict-limits-zh","Coinbase 讓 AI 代理代交易與代支付是對的，但前提是嚴格限權","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781409758550-mjql.png","2026-06-14T04:02:15.334232+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"5e2ed9f7-4240-429b-97c7-ffd31e4a45ee","llm-research-engineers-post-training-services-zh","LLM研究工程師把後訓練做成服務","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781402598646-2jzs.png","2026-06-14T02:02:46.765352+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"09e34016-bbc0-4313-b090-2dbfdd6cf96a","fine-tuning-slms-turns-enterprise-ai-practical-zh","SLM 微調把企業 AI 變可用","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781359406320-5jrq.png","2026-06-13T14:02:55.242488+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"06a33326-5420-4e1d-99ff-233939652a44","aspire-microsoft-agent-framework-app-graph-zh","Aspire 把 Agent 圖譜收進一個 AppHost","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781353076983-n0ho.png","2026-06-13T12:17:30.314245+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"40cd5d8d-c9fc-4883-b978-f7f757c14488","fable-5-claude-code-like-coworker-zh","Fable 5 讓 Claude Code 更像真同事","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781324309029-2n7r.png","2026-06-13T04:18:00.6602+00:00",{"id":79,"slug":80,"title":81,"cover_image":82,"image_url":82,"created_at":83,"category":13},"5bff363a-295a-47d3-911b-411f5f45e2bb","fine-tuning-methods-sft-lora-dpo-rlhf-grpo-zh","SFT、LoRA、DPO、RLHF、GRPO 選型指南","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781262197359-7rgb.png","2026-06-12T11:02:33.190744+00:00",[85,90,95,100,105,110,115,120,125,130],{"id":86,"slug":87,"title":88,"created_at":89},"4ae1e197-1d3d-4233-8733-eafe9cb6438b","claude-now-uses-your-pc-to-finish-tasks-zh","Claude 開始幫你操作電腦","2026-03-26T07:20:48.457387+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"5bede67f-e21c-413d-9ab8-54a3c3d26227","googles-2026-ai-agent-report-decoded-zh","Google 2026 AI Agent 報告解讀","2026-03-26T11:15:22.651956+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"2987d097-563f-46c7-b76f-b558d8ef7c2b","kimi-k25-review-stronger-still-not-legend-zh","Kimi K2.5 評測：更強，但還不是神作","2026-03-27T07:15:55.277513+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"95c9053b-e3f4-4cb5-aace-5c54f4c9e044","claude-code-controls-mac-desktop-zh","Claude Code 也能操控 Mac 了","2026-03-28T03:01:58.58121+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"dc58e153-e3a8-4c06-9b96-1aa64eabbf5f","cloudflare-100x-faster-ai-agent-sandbox-zh","Cloudflare 的 AI 沙箱跑超快","2026-03-28T03:09:44.142236+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"1c8afc56-253f-47a2-979f-1065ff072f2a","openai-backs-isara-agent-swarm-bet-zh","OpenAI 挺 Isara 的 agent swarm …","2026-03-28T03:15:27.513155+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"7379b422-576e-45df-ad5a-d57a0d9dd467","openai-plan-automated-ai-researcher-zh","OpenAI 想做自動化 AI 研究員","2026-03-28T03:17:42.090548+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"48c9889e-86df-450b-a356-e4a4b7c83c5b","harness-engineering-ai-agent-reliability-2026-zh","駕馭工程：從「馬具」到「作業系統」，AI Agent 可靠性的終極密碼","2026-03-31T06:42:53.556721+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"96d8e8c8-1edd-475d-9145-b1e7a1b02b65","mcp-explained-from-prompts-to-production-zh","MCP 怎麼把提示詞變工作流","2026-04-01T09:24:39.321274+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"f2ca7720-b471-4ce5-9336-2a9ac2a876fd","amazon-bedrock-agents-multi-agent-workflows-zh","Amazon Bedrock Agents 進入多代理工作流","2026-04-01T09:30:29.945429+00:00"]