Tag

distillation

Distillation transfers a larger model’s behavior—ranking preferences, generation patterns, or reasoning signals—into a smaller student model. It matters because teams use it to cut inference cost and latency while keeping SLMs useful for reranking, generation, and cross-architecture alignment.

7 articles

Research/Jun 29

OPD lets you distill skills without brute-force RL

I break down On-Policy Distillation and turn the idea into a copy-ready post-training template.

Research/Jun 26

DanceOPD distills image-editing skills into one model

DanceOPD trains flow-matching image models to combine text-to-image and editing skills without them fighting each other.

Research/Jun 19

UNIEGO unifies egocentric video with proxy teachers

UNIEGO uses proxy models to distill nine teachers into one egocentric encoder.

Industry News/Jun 4

Apple’s Gemini deal turns cloud AI into local AI

Apple is using Google Gemini distillation and Nvidia confidential compute to push Siri toward local-first AI with cloud backup.

Research/May 21

CARV cuts diffusion-teacher gradient variance

CARV reduces Monte Carlo variance in diffusion-teacher pipelines by reusing expensive upstream work and smarter noise sampling.

Research/Apr 30

Select-to-Think: Let SLMs Re-rank Themselves

A new method lets small language models re-rank their own candidates instead of calling an LLM at inference time.

Research/Apr 30

TIDE distills diffusion LLMs across architectures

TIDE distills diffusion LLMs across architectures, adding noise-aware weighting and tokenizer-aware objectives to improve a 0.6B student.