Tag
1 articles
A benchmark finds LLMs are strong on standard probability problems but falter on counterintuitive ones.