On April 24, 404 Media published a piece titled "The AI compute crunch is here". The argument is that the era of cheap, unlimited AI is ending. In the same week, GitHub paused new Copilot Pro, Pro+ and Student signups, Anthropic tested removing Claude Code from its $20 Pro plan, Apple's $599 M4 Mac mini sold out on its own site, and Samsung's mobile chief warned the phone unit could post its first annual net loss in 2026. None of those events alone is the story. The story is that they happened in the same week.

Cheap AI was always a subsidy

For years, AI apps trained users on the wrong economics. One monthly fee, unlimited prompts, bigger models, faster answers. The math was never sustainable. The cost of inference, server racks, memory, power contracts, cooling and cloud capacity has never been low. What kept it cheap for the user was the gap between price and unit cost, which investors agreed to absorb to win share.

That gap is closing. Not because any single company decided to raise prices. Because the underlying inputs ran out of slack at the same time.

The rationing has reached the product layer

GitHub paused new signups for Copilot Pro, Pro+ and Student plans on April 20, while tightening usage limits and removing access to several expensive models. Anthropic briefly tested removing Claude Code from about 2 percent of new $20 monthly Pro signups, then reverted the public pricing page after backlash while the 2 percent test continued.

This is the part most readers can already feel. The product surface that used to say "unlimited" now says "limited" or "sold out". Companies are not announcing price hikes. They are gating signups quietly, capping heavy users, and letting the rationing show up in the UX before it shows up in a headline.

OpenAI's CFO Sarah Friar has said publicly the company does not have enough compute. That is the signal across labs, not just one. Inference is no longer free, even internally.

The strain is leaking into hardware

TechCrunch reported that the $599 M4 Mac mini base model sold out on Apple's site for the first time, while open-box units appeared on eBay between $715 and $795. The cause is not a manufacturing shortage. It is local AI demand. A consumer desktop is now compute inventory because its memory makes it useful for running on-device models.

Samsung is feeling the same pressure from the other side of the stack. Its mobile chief, TM Roh, internally warned that the mobile division could post its first-ever annual net loss as DRAM and NAND prices double. According to reporting from Ars Technica, RAM could account for more than a third of the build cost of a budget phone by mid-2026. Memory is becoming a phone bill problem, not just a server problem.

The infrastructure beats are the loud ones. The quiet beat is that one upcoming Nvidia AI rack carries memory equivalent to roughly 4,600 Galaxy S26 Ultra phones. AI server demand is not just one buyer at the table. It is the buyer that sets the price.

Cost per token is the new benchmark

OpenAI researcher Noam Brown wrote that "what matters is intelligence per token or per $". In a world of unlimited compute, that line is academic. In a world where compute is scarce, it is a strategy.

Exponential View made the same argument in a different register. The simplest American playbook for the past few years was more compute, better benchmarks. Scarcity changes the scoreboard. Real capability now has to be measured per token, per dollar and per user. That is not a minor accounting shift. It is a change to which company wins the next round.

The constraint is not evenly distributed. Export controls, chip supply, cloud capacity and domestic hardware readiness make compute a strategic constraint, especially in China. The teams that have spent years optimizing under hardware caps may be better positioned than the teams that have spent years scaling on credit. The debate is also spreading beyond Silicon Valley, with engineers asking why their own ecosystems are still betting on application layers and not on the stack underneath.

Why it matters

For three years, the user experience of AI was abundance. You opened a chat box. You asked. You got an answer that felt free, because someone else was paying. That UX, more than any model release, is what made AI feel inevitable. It taught a generation of users that AI is the way you do work, the way you write code, the way you draft emails, all without thinking about the meter running underneath.

Subsidy is what shaped expectations. When subsidy ends, expectations have to readjust. The reaction is not just price. It is throttling. It is gating. It is the slow conversion of "unlimited" into "depends on your tier". The wave of free-tier reductions, model availability changes and signup pauses is not a marketing story. It is the user-facing edge of an inference cost that was always real.

The next AI race will not be only about who can build the smartest model. It will be about who can afford to run it at the price they trained users to expect.

If AI gets more capable by consuming scarce infrastructure, who should absorb the cost when the infrastructure runs out, the users who came to depend on it, or the platforms that taught them to expect abundance?

Originally published as an Instagram carousel on @recul.ai.