[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-peft-llm-fine-tuning-without-full-retraining-en":3,"article-related-peft-llm-fine-tuning-without-full-retraining-en":30,"series-ai-agent-4d6fc0c2-481a-48c6-9743-2f3f77945134":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"4d6fc0c2-481a-48c6-9743-2f3f77945134","peft-llm-fine-tuning-without-full-retraining-en","PEFT for LLM Fine-Tuning Without Full Retraining","\u003Cp data-speakable=\"summary\">PEFT fine-tunes \u003Ca href=\"\u002Ftag\u002Fllms\">LLMs\u003C\u002Fa> by training small adapter layers instead of all model weights.\u003C\u002Fp>\u003Cp>This guide is for developers who want to customize a large language model without paying the cost of full retraining. By the end, you will know the core PEFT idea, the main techniques, and how to set up a LoRA-based fine-tuning workflow that keeps the base model frozen.\u003C\u002Fp>\u003Cp>You will also be able to explain why PEFT is practical in production, what changes when you use adapters instead of full fine-tuning, and how to check that your training setup is actually updating only the small set of trainable parameters.\u003C\u002Fp>\u003Ch2>Before you start\u003C\u002Fh2>\u003Cul>\u003Cli>Python 3.10+\u003C\u002Fli>\u003Cli>PyTorch 2.1+\u003C\u002Fli>\u003Cli>Hugging Face Transformers\u003C\u002Fli>\u003Cli>Hugging Face PEFT\u003C\u002Fli>\u003Cli>CUDA-capable GPU with at least 16 GB VRAM for a small LoRA demo\u003C\u002Fli>\u003Cli>Hugging Face account and access token if you plan to download gated models\u003C\u002Fli>\u003Cli>A pretrained causal LLM such as Llama, Mistral, or a smaller open model\u003C\u002Fli>\u003Cli>Git and a terminal on macOS, Linux, or Windows Subsystem for Linux\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: Identify the PEFT target model\u003C\u002Fh2>\u003Cp>Your first outcome is a frozen base model that will stay unchanged during training. PEFT works best when you start from a pretrained model that already understands language well and only needs task-specific adaptation.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403469215-8tu4.png\" alt=\"PEFT for LLM Fine-Tuning Without Full Retraining\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Choose a model that matches your hardware and use case. For a first run, use a smaller open model so you can verify the training flow before scaling up.\u003C\u002Fp>\u003Cpre>\u003Ccode>from transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_id = \"mistralai\u002FMistral-7B-v0.1\"\ntokenizer = AutoTokenizer.from_pretrained(model_id)\nmodel = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n\nfor param in model.parameters():\n    param.requires_grad = False\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see the model load successfully and all base parameters marked as frozen. If you print a few parameter flags, they should show \u003Ccode>False\u003C\u002Fcode> for \u003Ccode>requires_grad\u003C\u002Fcode>.\u003C\u002Fp>\u003Ch2>Step 2: Add LoRA adapter modules\u003C\u002Fh2>\u003Cp>Your second outcome is a tiny trainable layer set that learns the task adaptation. LoRA is the most common PEFT method because it injects low-rank matrices into selected projection layers instead of updating the original weights.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403470865-h527.png\" alt=\"PEFT for LLM Fine-Tuning Without Full Retraining\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Target attention projections such as \u003Ccode>q_proj\u003C\u002Fcode> and \u003Ccode>v_proj\u003C\u002Fcode> to start. That keeps the adapter small while still giving the model enough capacity to learn new behavior.\u003C\u002Fp>\u003Cpre>\u003Ccode>from peft import LoraConfig, get_peft_model\n\nconfig = LoraConfig(\n    r=16,\n    lora_alpha=32,\n    lora_dropout=0.05,\n    target_modules=[\"q_proj\", \"v_proj\"],\n    task_type=\"CAUSAL_LM\",\n)\nmodel = get_peft_model(model, config)\nmodel.print_trainable_parameters()\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see a trainable-parameter report showing only a small fraction of the model is updated. In a healthy LoRA setup, that number is far below 1% of the full model.\u003C\u002Fp>\u003Ch2>Step 3: Prepare task data for adaptation\u003C\u002Fh2>\u003Cp>Your third outcome is a clean training set that teaches the adapter the exact behavior you want. PEFT does not remove the need for good data. It only reduces how many parameters you must update.\u003C\u002Fp>\u003Cp>Create examples that reflect the style, domain, or instruction pattern you want the model to learn. For customer support, that could mean support tickets and ideal responses. For code assistants, that could mean prompts and corrected completions.\u003C\u002Fp>\u003Cp>Keep the format consistent so the adapter learns a stable mapping. If your prompt template changes every few rows, training becomes harder to interpret and evaluate.\u003C\u002Fp>\u003Cp>You should see examples that look uniform and task-specific, with clear input and output pairs. If you can read a sample and immediately tell what behavior it teaches, your dataset is in good shape.\u003C\u002Fp>\u003Ch2>Step 4: Train only the adapter weights\u003C\u002Fh2>\u003Cp>Your fourth outcome is a working fine-tuning run that updates the adapter while leaving the base model untouched. This is where PEFT delivers its main cost savings: less memory, fewer trainable weights, and smaller checkpoints.\u003C\u002Fp>\u003Cp>Use your usual training loop or a trainer from Transformers. The key check is that only adapter parameters require gradients, while the frozen model remains static.\u003C\u002Fp>\u003Cpre>\u003Ccode>from transformers import Trainer, TrainingArguments\n\nargs = TrainingArguments(\n    output_dir=\".\u002Fpeft-output\",\n    per_device_train_batch_size=2,\n    num_train_epochs=3,\n    learning_rate=2e-4,\n    fp16=True,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=train_dataset,\n)\ntrainer.train()\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see loss values decrease over time and \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> memory use stay lower than a full fine-tuning run. If memory still spikes like a full-model update, recheck that the base model is frozen and that only LoRA layers are trainable.\u003C\u002Fp>\u003Ch2>Step 5: Save and reuse the adapter\u003C\u002Fh2>\u003Cp>Your fifth outcome is a portable adapter you can ship without redistributing the full model. This is one of PEFT's biggest production advantages because the adapter is usually tiny compared with the base checkpoint.\u003C\u002Fp>\u003Cp>Save the adapter separately, then load it on top of the same base model during \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa>. That lets you keep one foundation model and swap in many specialized behaviors.\u003C\u002Fp>\u003Cpre>\u003Ccode>model.save_pretrained(\".\u002Fcustomer-support-lora\")\ntokenizer.save_pretrained(\".\u002Fcustomer-support-lora\")\n\n# Later\nfrom peft import PeftModel\nbase_model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\nadapted_model = PeftModel.from_pretrained(base_model, \".\u002Fcustomer-support-lora\")\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see the adapter load quickly and inference should produce task-specific responses. If the output still looks generic, the adapter may not match the base model version you trained against.\u003C\u002Fp>\u003Ch2>Step 6: Compare PEFT with other fine-tuning methods\u003C\u002Fh2>\u003Cp>Your final outcome is the ability to choose the right PEFT method for the job. LoRA is popular, but adapters, prompt tuning, prefix tuning, and IA³ all solve the same problem with different tradeoffs.\u003C\u002Fp>\u003Cp>Use LoRA when you want a strong default choice with good quality and simple deployment. Consider prompt or prefix tuning when you want even lighter updates, and use adapters when you want a more modular architecture inside the network.\u003C\u002Fp>\u003Cp>You should now be able to explain why PEFT works: most tasks need a behavioral adjustment, not a rewrite of the entire model. That makes it a practical way to fine-tune LLMs under real compute limits.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>Before\u002FBaseline\u003C\u002Fth>\u003Cth>After\u002FResult\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Trainable parameters\u003C\u002Ftd>\u003Ctd>7B full fine-tuning\u003C\u002Ftd>\u003Ctd>~5M to ~20M with LoRA on a 7B model\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Trainable parameters\u003C\u002Ftd>\u003Ctd>13B full fine-tuning\u003C\u002Ftd>\u003Ctd>~10M to ~40M with LoRA on a 13B model\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Trainable parameters\u003C\u002Ftd>\u003Ctd>70B full fine-tuning\u003C\u002Ftd>\u003Ctd>~50M to ~200M with LoRA on a 70B model\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Adapter size\u003C\u002Ftd>\u003Ctd>Full model checkpoint\u003C\u002Ftd>\u003Ctd>~50 MB to ~200 MB adapter in common production setups\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Common mistakes\u003C\u002Fh2>\u003Cul>\u003Cli>Forgetting to freeze the base model. Fix: verify \u003Ccode>requires_grad=False\u003C\u002Fcode> for pretrained weights before training starts.\u003C\u002Fli>\u003Cli>Choosing the wrong target modules. Fix: inspect the model architecture and confirm the names of attention projection layers before applying LoRA.\u003C\u002Fli>\u003Cli>Using mismatched base and adapter versions. Fix: always load the adapter onto the exact base model family and revision used during training.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>What's next\u003C\u002Fh2>\u003Cp>Once you are comfortable with LoRA, explore adapter merging, quantized fine-tuning with QLoRA, and evaluation workflows that compare adapter-only models against full fine-tuning on the same task set.\u003C\u002Fp>","PEFT lets developers fine-tune LLMs by training small adapter layers instead of all weights.","dev.to","https:\u002F\u002Fdev.to\u002Fshrsv\u002Fpeft-explained-how-to-fine-tune-llms-without-retraining-billions-of-parameters-5cd9",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403469215-8tu4.png","ai-agent","en","7315dc1e-d3c0-4888-8466-1328e8819be0",[17,18,19,20,21],"PEFT","LoRA","LLM fine-tuning","Hugging Face Transformers","PyTorch",[23,24,25],"PEFT fine-tunes large models by training a small set of new parameters instead of all weights.","LoRA is the most common PEFT method and usually targets attention projection layers.","Adapter-based workflows reduce memory use, checkpoint size, and deployment overhead.",0,"2026-06-14T02:17:26.696413+00:00","2026-06-14T02:17:26.689+00:00","c58956f2-0e6f-4be5-b68a-39eda67428b3",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,34,36,38,40],{"name":18,"slug":33},"lora",{"name":17,"slug":35},"peft",{"name":20,"slug":37},"hugging-face-transformers",{"name":19,"slug":39},"llm-fine-tuning",{"name":21,"slug":41},"pytorch",{"id":15,"slug":43,"title":44,"language":45},"peft-llm-fine-tuning-without-full-retraining-zh","PEFT LoRA 微調 LLM 實作指南","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"88192de5-5bda-4eba-ae2a-157d4bbea8d7","coinbase-ai-agent-accounts-strict-limits-en","Coinbase is right to let AI agents trade and spend, with strict limits","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781409759613-rhzp.png","2026-06-14T04:02:15.747337+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"39f54361-7d76-4dfe-be99-dcae84f18a07","llm-research-engineers-post-training-services-en","LLM research engineers turn post-training into services","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781402606334-iyoh.png","2026-06-14T02:02:47.274885+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"00cabbf4-05e7-440c-be15-b8f441a1506f","fine-tuning-slms-turns-enterprise-ai-practical-en","Fine-Tuning SLMs Turns Enterprise AI Practical","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781359408003-mj9d.png","2026-06-13T14:02:55.855964+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"50d67ff2-698e-4ac1-9b5f-9233550bdc00","aspire-microsoft-agent-framework-app-graph-en","Aspire ties Microsoft Agent Framework into one app graph","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781353081127-r8l2.png","2026-06-13T12:17:30.899796+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"65872119-5c63-409f-b8f9-338096299326","fable-5-claude-code-like-coworker-en","Fable 5 让 Claude Code 更像真同事","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781324307625-of8c.png","2026-06-13T04:18:01.203421+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"9abeb68f-5750-43b5-baff-d454f58068f0","fine-tuning-methods-sft-lora-dpo-rlhf-grpo-en","Fine-Tuning Methods: SFT, LoRA, DPO, RLHF, GRPO","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781262188430-9te1.png","2026-06-12T11:02:33.676197+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"116d5ee9-a4f1-4b5a-aac5-5d035dd22bbe","amazon-bedrock-agents-multi-agent-workflows-en","Amazon Bedrock Agents Gets Multi-Agent Workflows","2026-04-01T09:30:30.197685+00:00"]