Tag
distillation
Distillation transfers a larger model’s behavior—ranking preferences, generation patterns, or reasoning signals—into a smaller student model. It matters because teams use it to cut inference cost and latency while keeping SLMs useful for reranking, generation, and cross-architecture alignment.
7 articles

OPD lets you distill skills without brute-force RL
I break down On-Policy Distillation and turn the idea into a copy-ready post-training template.

DanceOPD distills image-editing skills into one model
DanceOPD trains flow-matching image models to combine text-to-image and editing skills without them fighting each other.

UNIEGO unifies egocentric video with proxy teachers
UNIEGO uses proxy models to distill nine teachers into one egocentric encoder.

Apple’s Gemini deal turns cloud AI into local AI
Apple is using Google Gemini distillation and Nvidia confidential compute to push Siri toward local-first AI with cloud backup.

CARV cuts diffusion-teacher gradient variance
CARV reduces Monte Carlo variance in diffusion-teacher pipelines by reusing expensive upstream work and smarter noise sampling.

Select-to-Think: Let SLMs Re-rank Themselves
A new method lets small language models re-rank their own candidates instead of calling an LLM at inference time.

TIDE distills diffusion LLMs across architectures
TIDE distills diffusion LLMs across architectures, adding noise-aware weighting and tokenizer-aware objectives to improve a 0.6B student.