[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-peft-vs-full-fine-tuning-en":3,"article-related-peft-vs-full-fine-tuning-en":33,"series-industry-2a33bea3-0362-4c05-90c8-181ad6ff11b9":84},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":25,"views":29,"created_at":30,"published_at":31,"topic_cluster_id":32},"2a33bea3-0362-4c05-90c8-181ad6ff11b9","peft-vs-full-fine-tuning-en","PEFT vs Full Fine-Tuning","\u003Cp data-speakable=\"summary\">PEFT is the default for most \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> fine-tuning, while full fine-tuning fits edge cases needing deeper model change.\u003C\u002Fp>\u003Cp>Comparing \u003Ca href=\"https:\u002F\u002Fbhavishyapandit9.substack.com\u002Fp\u002Fllm-fine-tuning-at-scale-peft-vs\">PEFT\u003C\u002Fa> and full fine-tuning helps teams decide how to adapt large \u003Ca href=\"\u002Fnews\u002Faudio-language-models-arbitration-reversals-en\">language models\u003C\u002Fa> without overspending on GPUs or losing too much accuracy.\u003C\u002Fp>\u003Ch2>At a glance\u003C\u002Fh2>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Dimension\u003C\u002Fth>\u003Cth>PEFT\u003C\u002Fth>\u003Cth>Full fine-tuning\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Trainable parameters on a 13B model\u003C\u002Ftd>\u003Ctd>~26M at r=16, about 0.2%\u003C\u002Ftd>\u003Ctd>13B, or 100%\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Peak VRAM for 13B instruction tuning\u003C\u002Ftd>\u003Ctd>~56 GB for LoRA, ~18 GB for QLoRA\u003C\u002Ftd>\u003Ctd>~240 GB\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Typical compute cost per run\u003C\u002Ftd>\u003Ctd>~$28 for LoRA, ~+$8 for QLoRA on a 13B Alpaca-style run\u003C\u002Ftd>\u003Ctd>~$190 on 8×A100 80GB\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Accuracy gap vs full FT\u003C\u002Ftd>\u003Ctd>~0.2–0.8 pp for LoRA, ~1.1 pp for QLoRA on MMLU\u003C\u002Ftd>\u003Ctd>Baseline\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Serving footprint per customer variant\u003C\u002Ftd>\u003Ctd>~50–200 MB adapter, often merged or hot-swapped\u003C\u002Ftd>\u003Ctd>~26 GB model checkpoint\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Latency overhead at inference\u003C\u002Ftd>\u003Ctd>0% merged, ~5–15% dynamic adapters\u003C\u002Ftd>\u003Ctd>0% model-specific overhead\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>PEFT\u003C\u002Fh2>\u003Cp>PEFT works because most downstream tasks do not require rewriting the model from scratch. LoRA adds a low-rank update on top of frozen weights, so you are training a tiny slice of the network rather than every parameter. On a 13B model, that means roughly 26 million trainable parameters at r=16, which is small enough to fit on a single \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> and large enough to recover most of the performance of full fine-tuning.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780603380563-5547.png\" alt=\"PEFT vs Full Fine-Tuning\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The bigger advantage is operational, not just mathematical. QLoRA pushes the base model into 4-bit storage, then dequantizes it on the fly while training bf16 adapters. That makes previously awkward jobs practical on a 24 GB card, which is why PEFT became the default for teams that need to iterate quickly, support many customer variants, or work without a large GPU cluster.\u003C\u002Fp>\u003Ch2>Full fine-tuning\u003C\u002Fh2>\u003Cp>Full fine-tuning still matters when the task is not a small adaptation but a deep behavioral shift. If you are doing initial instruction tuning, extending the model to a new language, or chasing the last 1 to 2 percentage points on a frontier \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa>, updating all weights gives you more room to move. That extra freedom is exactly why it costs so much more in memory and compute.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780603376737-jh84.png\" alt=\"PEFT vs Full Fine-Tuning\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The catch is that full fine-tuning scales badly in production. A 13B model in fp16 is already around 26 GB before optimizer states, gradients, and activations are counted, which is why a single run can require multiple A100s. It is the right tool when accuracy or model transformation is the priority, but it is a poor default when you need frequent retraining, per-customer customization, or low-friction deployment.\u003C\u002Fp>\u003Ch2>When to pick what\u003C\u002Fh2>\u003Cp>If you are a product team adapting a strong base model to a domain like legal, support, finance, or code, pick PEFT. It gives you most of the quality gain with far less GPU spend, and it makes it realistic to maintain separate adapters for different customers or internal teams.\u003C\u002Fp>\u003Cp>If you are a research team or frontier lab trying to change the model’s core behavior, pick full fine-tuning. It is also the better choice when you need to alter language coverage, modality, or tokenization behavior in a way that low-rank updates are unlikely to capture well.\u003C\u002Fp>\u003Cp>If you are cost-constrained, moving fast, or planning to serve many variants, PEFT is the safer default. If you only have one model to ship and the last bit of benchmark gain matters more than infrastructure cost, full fine-tuning can still be justified.\u003C\u002Fp>\u003Cp>Default to PEFT, especially LoRA or QLoRA, unless you are doing foundational training or need a truly deep model rewrite.\u003C\u002Fp>","PEFT is the default for most LLM fine-tuning, while full fine-tuning fits edge cases needing deeper model change.","bhavishyapandit9.substack.com","https:\u002F\u002Fbhavishyapandit9.substack.com\u002Fp\u002Fllm-fine-tuning-at-scale-peft-vs",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780603380563-5547.png","industry","en","d1218662-3c24-4bd5-8fdd-826164864369",[17,18,19,20,21,22,23,24],"PEFT","LoRA","QLoRA","full fine-tuning","LLM fine-tuning","parameter-efficient fine-tuning","GPU memory","model serving",[26,27,28],"LoRA trains about 0.2% of a 13B model and usually lands within 0.2–0.8 percentage points of full fine-tuning.","QLoRA cuts memory enough to fine-tune large models on a single 24 GB to 48 GB GPU, but it adds some inference and training overhead.","Full fine-tuning is still the better fit for foundational instruction tuning, major behavior changes, and frontier accuracy work.",0,"2026-06-04T20:02:32.377539+00:00","2026-06-04T20:02:32.367+00:00","d19fc184-5852-4c4d-9ec0-db0c4841ac17",{"tags":34,"relatedLang":45,"relatedPosts":49},[35,37,39,41,43],{"name":19,"slug":36},"qlora",{"name":18,"slug":38},"lora",{"name":17,"slug":40},"peft",{"name":20,"slug":42},"full-fine-tuning",{"name":21,"slug":44},"llm-fine-tuning",{"id":15,"slug":46,"title":47,"language":48},"peft-vs-full-fine-tuning-zh","PEFT vs 全量微調","zh",[50,56,62,68,73,78],{"id":51,"slug":52,"title":53,"cover_image":54,"image_url":54,"created_at":55,"category":13},"f89355bb-7095-4791-8bcf-13303ccd7e6b","why-model-release-feeds-matter-more-en","Why model-release feeds matter more than model-launch posts","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780611464470-pz16.png","2026-06-04T22:17:15.959267+00:00",{"id":57,"slug":58,"title":59,"cover_image":60,"image_url":60,"created_at":61,"category":13},"da398326-d4e1-4926-9317-cfaba566a173","microsoft-first-reasoning-model-tracker-plain-english-en","Microsoft’s first reasoning model tracker in plain English","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780610594200-59z1.png","2026-06-04T22:02:49.9426+00:00",{"id":63,"slug":64,"title":65,"cover_image":66,"image_url":66,"created_at":67,"category":13},"8878baaa-a7f7-4abd-9237-0538f22da9c3","efrain-juarez-player-to-liga-mx-coach-en","Efraín Juárez’s path from player to Liga MX coach","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780606984860-bm5r.png","2026-06-04T21:02:35.624434+00:00",{"id":69,"slug":70,"title":71,"cover_image":11,"image_url":11,"created_at":72,"category":13},"314cc463-5cfd-4d9e-a40f-75143750525f","denver-hailstorm-weather-infrastructure-risk-en","Why Denver’s hailstorm is a reminder to treat weather like infrastruc…","2026-06-04T19:32:32.587927+00:00",{"id":74,"slug":75,"title":76,"cover_image":11,"image_url":11,"created_at":77,"category":13},"1b8c8ebf-4edc-444f-a2f5-35b33d600746","how-to-hire-mlops-engineer-2026-en","How to Hire an MLOps Engineer in 2026","2026-06-04T19:17:27.009669+00:00",{"id":79,"slug":80,"title":81,"cover_image":82,"image_url":82,"created_at":83,"category":13},"07758d84-d1c5-4aab-8278-3f01e2db1e27","4-takeaways-from-cloudflares-ai-first-reset-en","4 takeaways from Cloudflare’s AI-first reset","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780590033061-ad0p.png","2026-06-04T16:17:29.263918+00:00",[85,90,95,100,105,110,115,120,125,130],{"id":86,"slug":87,"title":88,"created_at":89},"d35a1bd9-e709-412e-a2df-392df1dc572a","ai-impact-2026-developments-market-en","AI's Impact in 2026: Key Developments and Market Shifts","2026-03-25T16:20:33.205823+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"5ed27921-5fd6-492e-8c59-78393bf37710","trumps-ai-legislative-framework-en","Trump's AI Legislative Framework: What's Inside?","2026-03-25T16:22:20.005325+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"e454a642-f03c-4794-b185-5f651aebbaca","nvidia-gtc-2026-key-highlights-innovations-en","NVIDIA GTC 2026: Key Highlights and Innovations","2026-03-25T16:22:47.882615+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"0ebb5b16-774a-4922-945d-5f2ce1df5a6d","claude-usage-diversifies-learning-curves-en","Claude Usage Diversifies, Learning Curves Emerge","2026-03-25T16:25:50.770376+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"69934e86-2fc5-4280-8223-7b917a48ace8","openclaw-ai-commoditization-concerns-en","OpenClaw's Rise Raises Concerns of AI Model Commoditization","2026-03-25T16:26:30.582047+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"b4b2575b-2ac8-46b2-b90e-ab1d7c060797","google-gemini-ai-rollout-2026-en","Google's Gemini AI Rollout Extended to 2026","2026-03-25T16:28:14.808842+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"6e18bc65-42ae-4ad0-b564-67d7f66b979e","meta-llama4-fabricated-results-scandal-en","Meta's Llama 4 Scandal: Fabricated AI Test Results Unveiled","2026-03-25T16:29:15.482836+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"bf888e9d-08be-4f47-996c-7b24b5ab3500","accenture-mistral-ai-deployment-en","Accenture and Mistral AI Team Up for AI Deployment","2026-03-25T16:31:01.894655+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"5382b536-fad2-49c6-ac85-9eb2bae49f35","mistral-ai-high-stakes-2026-en","Mistral AI: Facing High Stakes in 2026","2026-03-25T16:31:39.941974+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"9da3d2d6-b669-4971-ba1d-17fdb3548ed5","cursors-meteoric-rise-pressures-en","Cursor's Meteoric Rise Faces Industry Pressures","2026-03-25T16:32:21.899217+00:00"]