PEFT vs Full Fine-Tuning
PEFT is the default for most LLM fine-tuning, while full fine-tuning fits edge cases needing deeper model change.

PEFT is the default for most LLM fine-tuning, while full fine-tuning fits edge cases needing deeper model change.
Comparing PEFT and full fine-tuning helps teams decide how to adapt large language models without overspending on GPUs or losing too much accuracy.
At a glance
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
| Dimension | PEFT | Full fine-tuning |
|---|---|---|
| Trainable parameters on a 13B model | ~26M at r=16, about 0.2% | 13B, or 100% |
| Peak VRAM for 13B instruction tuning | ~56 GB for LoRA, ~18 GB for QLoRA | ~240 GB |
| Typical compute cost per run | ~$28 for LoRA, ~+$8 for QLoRA on a 13B Alpaca-style run | ~$190 on 8×A100 80GB |
| Accuracy gap vs full FT | ~0.2–0.8 pp for LoRA, ~1.1 pp for QLoRA on MMLU | Baseline |
| Serving footprint per customer variant | ~50–200 MB adapter, often merged or hot-swapped | ~26 GB model checkpoint |
| Latency overhead at inference | 0% merged, ~5–15% dynamic adapters | 0% model-specific overhead |
PEFT
PEFT works because most downstream tasks do not require rewriting the model from scratch. LoRA adds a low-rank update on top of frozen weights, so you are training a tiny slice of the network rather than every parameter. On a 13B model, that means roughly 26 million trainable parameters at r=16, which is small enough to fit on a single GPU and large enough to recover most of the performance of full fine-tuning.

The bigger advantage is operational, not just mathematical. QLoRA pushes the base model into 4-bit storage, then dequantizes it on the fly while training bf16 adapters. That makes previously awkward jobs practical on a 24 GB card, which is why PEFT became the default for teams that need to iterate quickly, support many customer variants, or work without a large GPU cluster.
Full fine-tuning
Full fine-tuning still matters when the task is not a small adaptation but a deep behavioral shift. If you are doing initial instruction tuning, extending the model to a new language, or chasing the last 1 to 2 percentage points on a frontier benchmark, updating all weights gives you more room to move. That extra freedom is exactly why it costs so much more in memory and compute.

The catch is that full fine-tuning scales badly in production. A 13B model in fp16 is already around 26 GB before optimizer states, gradients, and activations are counted, which is why a single run can require multiple A100s. It is the right tool when accuracy or model transformation is the priority, but it is a poor default when you need frequent retraining, per-customer customization, or low-friction deployment.
When to pick what
If you are a product team adapting a strong base model to a domain like legal, support, finance, or code, pick PEFT. It gives you most of the quality gain with far less GPU spend, and it makes it realistic to maintain separate adapters for different customers or internal teams.
If you are a research team or frontier lab trying to change the model’s core behavior, pick full fine-tuning. It is also the better choice when you need to alter language coverage, modality, or tokenization behavior in a way that low-rank updates are unlikely to capture well.
If you are cost-constrained, moving fast, or planning to serve many variants, PEFT is the safer default. If you only have one model to ship and the last bit of benchmark gain matters more than infrastructure cost, full fine-tuning can still be justified.
Default to PEFT, especially LoRA or QLoRA, unless you are doing foundational training or need a truly deep model rewrite.
// Related Articles
- [IND]
Efraín Juárez’s path from player to Liga MX coach
- [IND]
Why Denver’s hailstorm is a reminder to treat weather like infrastruc…
- [IND]
How to Hire an MLOps Engineer in 2026
- [IND]
4 takeaways from Cloudflare’s AI-first reset
- [IND]
5 ways Harriet Sperling echoes Kate Middleton
- [IND]
5 kOps release notes for Kubernetes admins