Tag
1 articles
A new paper shows audio-language models often encode the right audio answer, but text still wins the final decision.