Apple just tested the smartest "reasoning" AI Models out there: Claude 3.7 Sonnet, DeepSeek-R1, OpenAIโs o1/o3.
The verdict?
They didnโt just underperform.
They ๐ฐ๐ผ๐น๐น๐ฎ๐ฝ๐๐ฒ๐ฑ when things got to complex.
Even when you gave them the algorithm, they couldnโt follow it.
Worse, when tasks got harder, they ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ฒ๐ฑ ๐น๐ฒ๐๐, not more.
This confirms what many testers already feel in their gut:
AI looks smart until it has to think.
Because real reasoning isnโt just generating confident answers.
Itโs about:
โข Navigating uncertainty
โข Spotting whatโs missing
โข Asking, โWait, does this even make sense?โ
And thatโs what great testers do every day.
We donโt just validate that something works.
We question ๐๐ต๐, ๐ต๐ผ๐, ๐ฎ๐ป๐ฑ ๐๐ต๐ฎ๐ could break it next.
AI can make us more productive.
But when complexity scales, ๐๐ต๐ฒ ๐๐ ๐ถ๐ ๐ป๐ผ๐ the reasoning engine.
๐ฌ๐ผ๐ ๐ฎ๐ฟ๐ฒ.
Original Paper: https://machinelearning.apple.com/research/illusion-of-thinking
The verdict?
They didnโt just underperform.
They ๐ฐ๐ผ๐น๐น๐ฎ๐ฝ๐๐ฒ๐ฑ when things got to complex.
Even when you gave them the algorithm, they couldnโt follow it.
Worse, when tasks got harder, they ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ฒ๐ฑ ๐น๐ฒ๐๐, not more.
This confirms what many testers already feel in their gut:
AI looks smart until it has to think.
Because real reasoning isnโt just generating confident answers.
Itโs about:
โข Navigating uncertainty
โข Spotting whatโs missing
โข Asking, โWait, does this even make sense?โ
And thatโs what great testers do every day.
We donโt just validate that something works.
We question ๐๐ต๐, ๐ต๐ผ๐, ๐ฎ๐ป๐ฑ ๐๐ต๐ฎ๐ could break it next.
AI can make us more productive.
But when complexity scales, ๐๐ต๐ฒ ๐๐ ๐ถ๐ ๐ป๐ผ๐ the reasoning engine.
๐ฌ๐ผ๐ ๐ฎ๐ฟ๐ฒ.
Original Paper: https://machinelearning.apple.com/research/illusion-of-thinking
Christine Pinto
CPTO of Epic Test Quest
Co-Founder and CTPO @Epic Test Quest | Conference speaker on AI and Quality Leadership | Long-time tester | Building tools testers actually enjoy using | Join the quest to level up software quality
Sign in
to comment
Create E2E tests visually. Get clear, readable YAML you can actually maintain.
Explore MoT
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.