Skip to main content

From ChatGPT to Gemini: how AI is rewriting the internet

See all Stories

H
Apple’s new research paper says AI reasoning isn’t all it’s cracked up to be.

Right before WWDC 2025, Apple researchers published a paper called The Illusion of Thinking (PDF) that made waves. The researchers wrote that popular and buzzy AI models “face a complete accuracy collapse beyond certain complexities,” especially with things they’ve never seen before.

They presented models from OpenAI, Anthropic, and DeepSeek with new and complex puzzle games and found their reasoning ability “increases with problem complexity up to a point, then declines.”

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures.
Image: Apple
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.