[RSCH] 6 min readOraCore Editors

How to Prevent Catastrophic Forgetting in LLM Fine-Tuning

Use Anchored Weight Decay to reduce prior-task drift during LLM fine-tuning.

Share LinkedIn
How to Prevent Catastrophic Forgetting in LLM Fine-Tuning

Use Anchored Weight Decay to reduce prior-task drift during LLM fine-tuning.

This guide is for ML engineers and researchers who fine-tune large language models and need to preserve prior capabilities while adding new skills.

By following the steps below, you will have a practical workflow for reducing catastrophic forgetting during continual post-training with Evolution Strategies, plus a way to verify whether your model is drifting or recovering over time.

Before you start

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

  • Access to an LLM fine-tuning environment, such as a GPU workstation or cloud training job.
  • Python 3.10+.
  • PyTorch 2.1+.
  • One or more evaluation benchmarks for prior tasks, such as HellaSwag or your own held-out validation set.
  • An Evolution Strategies implementation or research codebase from the paper and repo you are extending.
  • GitHub repository for the research reference: GitHub and the source article at Cognizant.

Step 1: Baseline prior-task scores

Your first outcome is a clean before-and-after baseline for every task you want to preserve. Measure the model on prior tasks before any new fine-tuning starts so you can tell whether later score drops are real forgetting or just normal training noise.

How to Prevent Catastrophic Forgetting in LLM Fine-Tuning

Record task-level metrics such as accuracy, exact match, or pass rate, and save them with the base checkpoint name. If you are using HellaSwag, keep the same prompt format, decoding settings, and evaluation seed across runs.

When you rerun the baseline, you should see the same scores within a small margin. If the numbers swing widely, fix your evaluation setup before changing the model.

Step 2: Run a short ES fine-tuning trial

Your next outcome is a controlled training trace that shows whether the model drifts early and recovers later. Start with a small Evolution Strategies trial on the new target task so you can inspect the full trajectory instead of only the final checkpoint.

How to Prevent Catastrophic Forgetting in LLM Fine-Tuning
python train_es.py \
  --base-model meta-llama/Llama-3.1-8B-Instruct \
  --target-task countdown \
  --population-size 64 \
  --iterations 300 \
  --eval-every 10 \
  --save-checkpoints true

After the run, compare prior-task scores at each checkpoint. You should see whether accuracy falls early, then rebounds, which indicates temporary drift rather than permanent damage.

Step 3: Add Anchored Weight Decay

Your third outcome is a training rule that limits parameter drift away from the starting point. Anchored Weight Decay, or AWD, adds a simple penalty that pulls updates back toward the anchor weights from the pre-fine-tuned model.

Implement AWD by keeping a frozen copy of the initial weights and adding a decay term to the ES update rule. Conceptually, the update should favor improvements on the target task while discouraging large moves in weight space that can erase prior skills.

After enabling AWD, you should see smaller swings in prior-task accuracy across iterations. If the target task still improves but old-task scores stop collapsing, the anchoring is doing its job.

Step 4: Compare ES against GRPO on the same tasks

Your fourth outcome is a fair comparison that tells you whether forgetting is specific to ES or part of the broader post-training setup. Run the same target task and evaluation suite with GRPO, using the same base model, prompt set, and benchmark schedule.

Track both target-task progress and average prior-task accuracy over time. The key is not just the final score, but the shape of the curve during training.

You should see that GRPO can also forget prior tasks, even if its drift pattern differs from ES. If both methods degrade old skills, the likely fix is not a method swap alone, but a stronger retention strategy.

Step 5: Promote the best checkpoint with drift checks

Your final outcome is a release rule for choosing the checkpoint that best balances new skill and old skill. Select the model version that meets your target-task threshold while keeping prior-task scores within your acceptable loss budget.

Before promotion, rerun the full prior-task suite on the chosen checkpoint and compare it to the original baseline. Keep the checkpoint only if the model preserves the capabilities you care about and the drift stays within tolerance.

If the final checkpoint recovers prior-task performance, you should see near-baseline results rather than a permanent drop. That is the signal that the model adapted without catastrophic forgetting.

MetricBefore/BaselineAfter/Result
HellaSwag prior-task accuracyBaseline checkpointDropped by 8% early, then recovered near baseline
Training interpretationAssumed irreversible forgettingObserved temporary drift in weight space
Retention strategyPlain ES updateES plus Anchored Weight Decay

Common mistakes

  • Only checking the final checkpoint. Fix: log prior-task scores throughout training so you can spot temporary drift and recovery.
  • Changing the evaluation setup between runs. Fix: keep prompts, decoding settings, and seeds constant across baseline and fine-tuned models.
  • Treating forgetting as ES-only. Fix: test another post-training method, such as GRPO, before concluding the issue is tied to one optimizer.

What's next

Once you have stable retention, extend the same workflow to multi-task continual fine-tuning, tune the AWD strength per domain, and add automatic rollback rules for checkpoints that exceed your allowed drift window.