☀️ AI Morning Minute: Pretraining
Before an AI can do anything useful, it has to learn everything. That’s pretraining.
When you talk to an AI and it knows about history, science, code, literature, and the rules of grammar in a dozen languages, none of that came from fine-tuning or prompting. It came from pretraining: a massive, expensive, months-long process where the model reads an enormous slice of human-generated text and learns to predict what comes next. Everything an AI knows before it’s been trained for any specific job was absorbed during pretraining. It’s the foundation everything else is built on.
What it means
Pretraining is the initial training phase where a large language model learns from a vast dataset of text, typically books, websites, academic papers, and code, often totaling trillions of words. The model’s task during pretraining is simple in theory: predict the next token given everything that came before. It does this billions of times, adjusting its weights with each attempt, until it gets very good at prediction.
That prediction task requires genuine language understanding. A model that reliably predicts what comes next has learned grammar, facts, reasoning patterns, and world knowledge as a byproduct. Pretraining a frontier model takes months and costs tens to hundreds of millions of dollars, which is why only a handful of organizations can do it.
Why it matters
Pretraining is where the model’s fundamental capabilities are set. Fine-tuning and RLHF can shape behavior, but they can’t add knowledge that wasn’t present after pretraining. A weak pretrained model produces a weak final model regardless of what comes after.
The data used in pretraining determines what the model knows, what biases it absorbs, and what languages and domains it handles well or poorly. Decisions about what to include, filter, or weight in the dataset have downstream effects on every product built on top of that model.
Pretraining is where most of the compute in AI gets spent, and it’s becoming more expensive, not less. Labs are training on larger datasets with larger models, pushing costs higher each generation. Andrej Karpathy joined Anthropic in 2026 to work on using Claude to improve Claude’s own pretraining, one of the clearest signals of how central this phase remains.
Simple example
Think of pretraining as the years a person spends reading before they ever write anything professionally. A person who has read widely across science, history, fiction, and technical manuals will write better first drafts than someone who hasn’t, regardless of the specific writing training they receive later.
Pretraining is those years of reading, compressed into months of compute, done at a scale no human could match.
It’s not AI related but...
My wife and I are launching a new service called PostARTing — an art subscription that mails original, handmade art postcards to your door. Real artists, real mail, no prints, and no AI art.
We opened a pilot to gauge interest and kick the tires and, well, it filled up in under half an hour.
So now we’re opening a limited Charter Club before we officially launch. Founding pricing locked in forever, plus a free card to send a friend. And there are just a few dozen spots left.
If getting real art in the mail sounds like your kind of thing, claim a spot before they’re all gone.

