Tag
1 articles
This paper reframes supervised fine-tuning as designing target distributions, not just minimizing token loss.