☀️ AI Morning Minute: The METR Study

The "4-Minute vs. 4-Hour" Rule: Measuring the true horizon of AI autonomy.

Mar 01, 2026

What it means:

Conducted by the Model Evaluation and Threat Research (METR) group, this landmark study (updated in February 2026) measures AI not by “scores,” but by Time Horizons. It tracks how long it takes a human expert to complete a task versus the AI’s success rate. The core finding is a “cliff” in AI capability based on task duration:

The 4-Minute Rule: Current frontier models (like GPT-5.1 and Claude 4.5) have a ~100% success rate on tasks that take humans less than 4 minutes.
The 4-Hour Cliff: For tasks requiring more than 4 hours of human focus, AI success rates plummet to below 10%.

Why it matters:

The “Moore’s Law” for Agents: The study found that the “time horizon” of tasks AI can reliably complete (50% of the time) has been doubling every 7 months for the last six years.
Reliability vs. Reasoning: This explains why AI feels like a “god” at writing an email but “fails” at managing a complex software project. AI struggles with long-chain reasoning—stringing together hundreds of small actions without making a fatal mistake.
The Productivity Slowdown: Interestingly, METR’s 2026 data shows that because developers are now tackling harder tasks with AI, they actually reported a 19% slowdown in 2025 as they spent more time “fixing” AI-generated errors than they did writing code from scratch.

Simple example:

Think of an AI as a high-speed sprinter.

Tasks < 4 mins: It can sprint to the mailbox and back with 100% success.
Tasks > 4 hours: It is being asked to run a marathon. It starts fast, but by mile 10, it gets confused, takes a wrong turn, and eventually collapses because it can’t keep its focus for that long.

The AI Morning Minute

Discussion about this post

Ready for more?