Tag
1 articles
This paper maps when multimodal training should align views, predict across them, or be avoided.