☀️ AI Morning Minute: Tokens...again
Every word you type costs something. Every word the AI writes back costs more. Tokens are the unit of that exchange.
We did a quick pass on tokens back in February. Since then they’ve come up every week since, so they’re worth another look.
Every conversation is a transaction in tokens: you spend them on input, the model spends them on output, and when the budget runs out, things stop. When AI products hit unexpected limits, give truncated answers mid-thought, or pad responses with words you didn’t ask for, tokens are usually why.
What it means
A token is the basic unit AI models use to read and write text, roughly four characters or three-quarters of a word. Common short words are usually one token each. Longer or unusual words get split: "extraordinarily" becomes three tokens, "tokenization" becomes two. Your input prompt, the model's response, and system instructions running behind the scenes all count against the same budget, set by the model's context window.
When that window fills, the model either stops, starts forgetting earlier parts of the conversation, or in agentic systems, triggers compaction. Most consumer tools don't show a token counter, so you often don't know you're close until you hit it.
Why it matters
Output tokens cost roughly four to five times more than input tokens at most major providers. That gap creates a real incentive for models to be verbose. RLHF training can inadvertently reward longer responses because human raters often read length as thoroughness. A model learns to pad answers not because it has more to say, but because longer responses scored better in training.
Running out of tokens mid-task has consequences beyond annoyance. In an agentic workflow, hitting the limit mid-run can corrupt output or cause the agent to lose track of earlier decisions. Token budgeting is real engineering work.
Tokens are also an attack surface. Prompt injection attacks work by flooding the context with adversarial instructions that crowd out the original system prompt. The more tokens an attacker inserts, the more control they get. Token limits aren’t just a cost issue. They’re a security boundary.
Simple example
You paste a long contract into an AI tool and ask it to summarize the key risks. It starts well, then cuts off. The contract plus your question plus the model’s background instructions exceeded the context window. The model didn’t forget how to read. It ran out of room. Shortening your paste, removing boilerplate, or switching to a larger context window all fix the same problem: you were over budget.

