☀️ AI Morning Minute: ARC-AGI-3
The new test for AI intelligence where every top model scores below 1%
Benchmarks tell you what AI can do. But the best ones tell you what it can’t. And right now, the most honest measure of the gap between human and machine intelligence is a set of simple video games that no AI model can figure out.
What it means:
ARC-AGI-3 is an interactive reasoning benchmark released by the ARC Prize Foundation, co-founded by AI researcher Francois Chollet and Zapier co-founder Mike Knoop. Unlike its predecessors, which had models find patterns in static images, ARC-AGI-3 drops AI agents into turn-based game environments with no instructions, no rules, and no stated goals.
The agent has to explore, figure out how the world works, discover the win condition, and execute a plan. All on its own.
Behind the name:
“ARC” stands for Abstraction and Reasoning Corpus. “AGI” is in the name because Francois Chollet designed it specifically to measure the kind of general reasoning you’d need for artificial general intelligence. The number just counts how many times they’ve had to make the test harder because AI caught up.
ARC-AGI-3 is the third version of the benchmark. The naming tracks the iterations:
ARC-AGI-1 came out in 2019. Static grid puzzles where the model had to find patterns in input/output pairs. By 2025, frontier models were scoring 90%+ on it. Basically solved.
ARC-AGI-2 launched in March 2025. Harder compositional puzzles, still static. Models started low but climbed fast. Gemini 3.1 Pro hit 77% by early 2026.
ARC-AGI-3 dropped in March 2026. Completely different format. Interactive, turn-based game environments instead of static puzzles. No instructions, no goals given.
Why it matters:
Humans solve 100% of the environments. The best AI model, Gemini 3.1 Pro, scores 0.37%. GPT 5.4 hits 0.26%. That’s not a gap you close with a bigger training run. It points to something missing in how current models reason.
Earlier versions of the benchmark are already beaten. Gemini 3.1 Pro scores 98% on ARC-AGI-1. But those tests measured pattern recognition in static puzzles. ARC-AGI-3 measures something harder: can you learn new skills on the fly in an environment you’ve never seen before?
The ARC Prize 2026 competition offers over $2 million in prizes, and all solutions must be open-sourced. A high score on ARC-AGI-3 could serve as evidence of AGI, because the kind of reasoning it requires (adapting to new environments without training) is what you’d need to do “most economically valuable work,” which is the definition most labs use.
Simple example:
You’re dropped into a board game you’ve never seen. No rulebook, no one to explain. You push a piece and something happens. You push another and something different happens. Within a few turns, you’ve figured out the goal and you’re winning.
An eight-year-old can do this.
The best AI on the planet scores less than 1%. ARC-AGI-3 is that board game, run a thousand times with a thousand different rule sets.
If reading this each morning has you thinking "I should probably understand this stuff better," good news: I run a 90-minute workshop called Making Sense of AI. Plain language, live demos, no technical background needed. Next session is April 8th, 10am Pacific. $50.

