AI doesnโ€™t fail at randomness. It fails at complexity.

09 Jun 2025

A screenshot from the paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Model... image
Apple just tested the smartest "reasoning" AI Models out there: Claude 3.7 Sonnet, DeepSeek-R1, OpenAIโ€™s o1/o3.
The verdict?

They didnโ€™t just underperform.
They ๐—ฐ๐—ผ๐—น๐—น๐—ฎ๐—ฝ๐˜€๐—ฒ๐—ฑ when things got to complex.

Even when you gave them the algorithm, they couldnโ€™t follow it.
Worse, when tasks got harder, they ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐—ฒ๐—ฑ ๐—น๐—ฒ๐˜€๐˜€, not more.

This confirms what many testers already feel in their gut:
AI looks smart until it has to think.

Because real reasoning isnโ€™t just generating confident answers.
Itโ€™s about:

โ€ข Navigating uncertainty
โ€ข Spotting whatโ€™s missing
โ€ข Asking, โ€œWait, does this even make sense?โ€

And thatโ€™s what great testers do every day.

We donโ€™t just validate that something works.
We question ๐˜„๐—ต๐˜†, ๐—ต๐—ผ๐˜„, ๐—ฎ๐—ป๐—ฑ ๐˜„๐—ต๐—ฎ๐˜ could break it next.

AI can make us more productive.
But when complexity scales, ๐˜๐—ต๐—ฒ ๐—”๐—œ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ the reasoning engine.
๐—ฌ๐—ผ๐˜‚ ๐—ฎ๐—ฟ๐—ฒ.

Original Paper: https://machinelearning.apple.com/research/illusion-of-thinking
Christine Pinto
Award-Winning QA Leader

Conference speaker on AI and Quality Leadership | Long-time tester | Building tools testers actually enjoy using | Join the quest to level up software quality

Chapter Lead
Sign in to comment
Explore MoT
Leading with AI - The London Edition image
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Into The Motaverse image
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.
Subscribe to our newsletter