☀️ AI Morning Minute: Jalapeño
OpenAI spent years paying NVIDIA for chips. Yesterday, they announced their own.
So yeah, running AI at scale is expensive, and most of that expense runs through one company: NVIDIA.
Every major AI lab, including OpenAI, has depended on NVIDIA’s GPUs to train and serve their models. Jalapeño is OpenAI’s first custom chip, built with Broadcom specifically for running large language models. It’s a significant move, and it happened fast.
What it means
Jalapeño is a custom inference processor, meaning it’s designed not for training AI models but for running them at scale. That distinction matters. Training a model happens once. Inference, the process of generating a response every time someone uses ChatGPT, happens billions of times a day. Optimizing for inference is where the real cost savings live.
OpenAI designed Jalapeño from scratch around its own models and worked with Broadcom to manufacture it. The entire development cycle, from initial design to working silicon, took nine months. That’s unusually fast for a custom chip. OpenAI says they accelerated the process using their own AI models to assist with chip design. The chip is a reticle-sized ASIC, meaning it uses the maximum die area a single photolithography exposure can produce, which is one way to maximize compute density without stacking multiple chips.
Early testing shows Jalapeño delivers substantially better performance per watt than current state-of-the-art processors. OpenAI plans to deploy it at gigawatt scale across data center partners over multiple generations.
Why it matters
It’s a direct shot at NVIDIA dependency. Every dollar OpenAI spends on custom silicon is a dollar that doesn’t go to NVIDIA. Google has Tensor Processing Units. Amazon has Trainium. Apple has its own chips. OpenAI was the last major AI player still entirely dependent on merchant silicon. That just changed.
Inference cost is the real business problem. Training makes headlines. Inference pays the bills, or eats them. Running ChatGPT for hundreds of millions of users requires enormous compute. A chip optimized specifically for LLM inference, rather than the general-purpose GPU workloads NVIDIA hardware handles, could meaningfully cut what it costs OpenAI to serve each response.
It signals OpenAI’s long-term ambitions. Chip development is expensive, slow, and risky. Companies only do it when they believe they’ll be running at a scale that justifies the investment for years. OpenAI is signaling they expect to be running at that scale, and that they intend to control more of their own stack to get there.
Simple example
A restaurant that buys all its produce from one supplier is at that supplier’s mercy on price, availability, and terms. Build your own farm and you control the inputs. It costs more upfront and requires expertise you didn’t have before. But at scale, the math changes.
OpenAI has been buying produce. Jalapeño is the farm.

