Tag
1 articles
This paper argues autoregressive language models can exhibit lookahead behavior despite training only on next-token prediction.