Back to home

Tag

LLM fine-tuning

LLM fine-tuning covers the methods used to adapt a base model to a specific task or domain, from supervised training to RL-based alignment. It matters because stability, data pipelines, and tooling shape real outcomes; examples include BPO/GBPO as PPO alternatives and AWS workflows with S3, SageMaker, and MLflow.

17 articles

Google OpenRL brings RL fine-tuning to Kubernetes
Model Releases/Jun 27

Google OpenRL brings RL fine-tuning to Kubernetes

Google’s OpenRL lets teams run LLM post-training and fine-tuning on their own Kubernetes clusters.

LLM fine-tuning turns generic models into domain tools
Research/Jun 27

LLM fine-tuning turns generic models into domain tools

A practical breakdown of enterprise LLM fine-tuning, from data prep to model choice, plus a copy-ready template.

LLM Fine-Tuning for Production in 2026
Research/Jun 24

LLM Fine-Tuning for Production in 2026

AgamiSoft’s guide maps the 2026 fine-tuning choices for production LLMs, from open models to data prep, evaluation, and deployment.

Fine-Tuning LLMs Locally: SFT, LoRA, DPO
Tools & Apps/Jun 19

Fine-Tuning LLMs Locally: SFT, LoRA, DPO

LLM Configurator’s Guide 13 explains when to fine-tune, how SFT, LoRA, and DPO differ, and how to prepare and evaluate datasets.

PEFT for LLM Fine-Tuning Without Full Retraining
AI Agent/Jun 14

PEFT for LLM Fine-Tuning Without Full Retraining

PEFT lets developers fine-tune LLMs by training small adapter layers instead of all weights.

LLM research engineers turn post-training into services
AI Agent/Jun 14

LLM research engineers turn post-training into services

A practical breakdown of Codersarts’ on-demand LLM training work, with a copy-ready template for evals, SFT, RLHF, and alignment.

How to Prevent Catastrophic Forgetting in LLM Fine-Tuning
Research/Jun 6

How to Prevent Catastrophic Forgetting in LLM Fine-Tuning

Use Anchored Weight Decay to reduce prior-task drift during LLM fine-tuning.

Fixing LLM forgetting in ES fine-tuning
Research/Jun 5

Fixing LLM forgetting in ES fine-tuning

This paper shows LLM fine-tuning with evolution strategies can drift, and anchored weight decay can curb it.

PEFT vs Full Fine-Tuning
Industry News/Jun 5

PEFT vs Full Fine-Tuning

PEFT is the default for most LLM fine-tuning, while full fine-tuning fits edge cases needing deeper model change.

LoRA Makes Fine-Tuning LLMs Practical
Tools & Apps/May 31

LoRA Makes Fine-Tuning LLMs Practical

LoRA cuts LLM fine-tuning to a small adapter layer, reducing VRAM, training time, and cost for teams with modest GPUs.

How to Fine-Tune LLMs with SFT, LoRA, and RLHF
Research/May 30

How to Fine-Tune LLMs with SFT, LoRA, and RLHF

Learn how to fine-tune a large language model with supervised training, LoRA, and alignment methods like RLHF and DPO.

How to Fine-Tune an LLM for Enterprise
AI Agent/May 21

How to Fine-Tune an LLM for Enterprise

A practical guide to choosing, training, and evaluating an enterprise LLM fine-tune.

Why fine-tuning LLMs for domain tasks is the right default
Research/May 16

Why fine-tuning LLMs for domain tasks is the right default

Fine-tuning is the best default when an LLM must be accurate in a narrow domain.

LoRA vs QLoRA vs Full Fine-Tuning
Industry News/May 16

LoRA vs QLoRA vs Full Fine-Tuning

A practical comparison of LoRA, QLoRA, and full fine-tuning for 2026 LLM projects.

Why Latent Agents Proves Multi-Agent Debate Should Be Internalized
Research/May 5

Why Latent Agents Proves Multi-Agent Debate Should Be Internalized

Latent Agents shows multi-agent debate works best when a single model internalizes it.

Why Bounded Ratio RL Replaces PPO's Clipped Objective
Research/Apr 21

Why Bounded Ratio RL Replaces PPO's Clipped Objective

BRRL gives PPO a cleaner theory, with BPO and GBPO aiming for more stable policy updates in control and LLM fine-tuning.

AWS uses S3 to speed LLM fine-tuning
Model Releases/Apr 2

AWS uses S3 to speed LLM fine-tuning

AWS shows how SageMaker Unified Studio, S3, and MLflow can fine-tune Llama 3.2 11B Vision Instruct on DocVQA data.