☀️ AI Morning Minute: Compaction

AI agents can work for hours on a complex task. But they have a memory limit. Compaction is how they keep going.

May 31, 2026

Every AI model has a context window, a fixed amount of text it can hold in working memory at once. In a short conversation, that limit never comes up. But an AI agent working through a codebase for two hours, reading files, running commands, and tracking decisions, fills that window fast. Compaction is the mechanism that lets it keep going. Instead of hitting the limit and stopping, the agent compresses its own history into a summary and continues from there.

What it means

Compaction is an automatic context management process where an AI agent summarizes the older parts of its conversation history when it approaches its context window limit, replacing the full transcript with a condensed version so the session can continue. In Claude Code, compaction triggers when the context reaches roughly 95% of its token limit. The agent sends its conversation history to a separate model call with instructions to compress it into key facts, then resumes from that summary. You don’t see it happen.

At some point the agent is working from a compressed version of earlier decisions rather than the full record. Anthropic describes the goal as keeping active context focused and performant, not just staying under a token cap.

Why it matters

Without compaction, long agentic tasks would hit the context ceiling and stop, or require you to manually restart with a fresh summary. Compaction makes multi-hour sessions practical. Real-world reports put token cost savings at 50 to 60% in multi-phase workflows.
What survives compaction matters as much as the process itself. Project-level instruction files reload from disk and survive intact. But older decisions, nested instructions, and subtle preferences from earlier in the session can be summarized, truncated, or dropped. If something must persist, it needs to live in a durable file before compaction kicks in.
Compaction isn’t universally loved. Some developers run shorter, cleaner sessions to avoid it, arguing that a model working from a compressed summary is less reliable than one starting fresh. The tradeoff is real: longer sessions with compaction, or shorter sessions with manual handoffs.

Simple example

You’re three hours into a Claude Code session refactoring a large codebase. The agent has read dozens of files, made hundreds of tool calls, and tracked a running list of decisions. The context hits 95%. Compaction fires: the agent summarizes everything, clears the history, and picks up from the summary.

You keep working. But the detailed reasoning from hour one is now a paragraph, not a transcript.

Do you know what Claude Code is?

Claude Code is the most powerful way to work with Claude — but it’s built for developers, which means setup is where most people give up before they ever see what it can do.

I built a free wizard to fix that. It walks you through the whole thing one step at a time. No prior coding required, and no guessing about what to click next.

If “I keep hearing about Claude Code but have no idea how to start” sounds like you, this is for you. Try it, it’s free

→ Try the setup wizard

The AI Morning Minute

Discussion about this post

Ready for more?