The bill for AI agents is no longer a flat rate

The cloud felt infinite until someone actually read the invoice: the AI industry is leaving the age of subsidies behind and entering the age of the token meter. The shift toward pay-as-you-go pricing exposes the real cost of artificial intelligence and reopens uncomfortable questions about sustainability, dependence, and productivity.

Rubén S.

Columnist on algorithmic rights and other digital wounds

1 Jun 2026

12 min

Contents

For a couple of years we lived inside a comfortable fiction: artificial intelligence felt like just another subscription, with roughly the economic complexity of Netflix or Spotify, something you paid for once a month and then forgot about. You paid a flat fee, opened ChatGPT, Claude, Copilot, or Cursor, and could turn an aimless afternoon into a seemingly endless run of solved tasks. There was always one more issue to close. It barely mattered whether you used the thing for ten minutes or ten hours. The psychological cost matched the financial one: zero, at the margin.

It has been exhilarating, and it has changed us a great deal, but that chapter is winding down. It may already be over.

Not because AI has failed. Quite the opposite: it has started working so well that we can no longer picture a way of working that leaves it out. When a tool becomes a capability, and when a service becomes infrastructure, someone has to pay the bill. And that bill was never as tidy as a monthly subscription.

The move from flat subscriptions to usage caps, and from there to pay-as-you-go, is not a simple price hike. It is a cost reveal. In the first phase, users saw a chat interface and a monthly fee. Providers saw inference, GPUs, memory, electricity, cooling, capacity commitments, abuse, demand spikes, and users perfectly capable of extracting thousands of dollars in compute value out of plans priced in the low double or triple digits.

As long as the main goal was growth and user acquisition, that tension could stay hidden. The party had to keep going. Once agents become daily work tools, or the workforce itself, it can’t anymore. Someone has to pay for it.

The flat rate was always a subsidy

Software as a service was built on an elegant promise: pay for access, not ownership. Don’t buy servers, don’t install versions, don’t maintain your own infrastructure. Subscribe. Let the provider update, scale, and operate. You just add users.

The per-seat model worked because a seat was a practical proxy for activity. More employees meant more usage, more value, and more willingness to pay. It wasn’t perfect, but it fit reasonably well with applications that humans used directly: CRMs, design tools, project trackers, office suites.

Agents break that symmetry. A human is no longer equivalent to a workflow. One human can orchestrate ten agents, each running dozens of steps, tool calls, queries to external resources, and validation loops. Work is no longer bounded by the pace of manual interaction. Activity breaks free from human limits.

Even though they’re built from the same parts, a working session with agents has little in common with a casual back-and-forth in a chat window. The first is a conveyor belt of inference: it plans, reads, writes, runs, fails, tries again, calls another tool, repeats, summarizes, corrects, runs the tests, fails, repeats, decides it needs a different approach, and by the time you look up, millions of tokens have quietly vanished.

For a while the providers absorbed those costs, or rather their investors did. Perhaps hoping that adoption would eventually tip the scales in their favor. Or perhaps they suspected all along that adoption would grow not as a preference but as a dependence.

But one day someone in finance asks whether all of this is going to turn into a business, or whether it’s just a wildly expensive way to improve READMEs. The market has answered: AI stays, the all-you-can-eat buffet does not.

June 2026 as the end of an era

GitHub replacing its Premium Request Units with AI Credits carries a symbolic weight for developers and a very direct one for their wallets. For many of us, Copilot was the first generative AI woven into everyday work. The change isn’t merely rebranding units with a more financial-sounding name. The important difference is that cost is now tied much more directly to actual consumption.

Plan prices may not change on the surface. Copilot may still look like Copilot. But the economic experience changes profoundly. It breaks a basic assumption: what used to be a productivity license with a fixed or predictable cost now demands budgets, active management, constant oversight, and administrative decisions.

Anthropic, which had already been capping its users’ spending for some time, is moving in the same direction. OpenAI, with Codex, and Google, with Gemini, are following similar paths.

Taken one by one, these changes can look like routine commercial tweaks. Taken together, they signal a regime change. The major providers are converging on the same reality: access can still be packaged as a product, but the real work is metered like infrastructure.

Back in March, Altman talked about a future in which intelligence would be bought by usage, with meters like electricity or water. At the time we didn’t imagine that future was so close, but we did know that intelligence is not electricity: it is, among other things, knowledge, creativity, experience, judgment, and power. From an operational standpoint, though, the industry seems to be pushing for AI to behave like a utility: always on, embedded in everything, essential to daily life, and billed on a regular cycle.

The cost is in keeping the thread

The least intuitive thing about agents isn’t how much they consume when they answer. It’s how much they consume just to keep track of what they’re doing.

In a short exchange, the cost is fairly easy to grasp: one input, one output, maybe a few rounds. In a working session, the unit of economics stops being the answer and becomes continuity. The system has to hold on to instructions, history, snippets of code, tool results, errors, earlier decisions, and enough context to choose the next step.

What looks like memory in the interface is usually repeated processing underneath. The agent doesn’t just move forward; it lugs a backpack of context around with it. And that backpack gets heavier the longer, more ambiguous, or noisier the task becomes.

That’s why the price per token can drop while the bill goes up. It’s not a contradiction. If each unit costs less but each useful goal requires far more units, the economics can still get worse. Local efficiency doesn’t guarantee total efficiency.

This is the difference between using AI as an assistant and using it as a process. The assistant helps at specific moments. The process persists, watches, tests, corrects, and needs to remember why it’s there. That persistence is exactly what makes it so useful. It’s also what makes it so expensive.

Why it started with software development

It’s no accident that the first serious crack shows up in the tools we use to write code and run systems. Our ecosystem stacks up several conditions that make it the perfect laboratory for agents.

The first is that almost everything is text, which is what LLMs understand best. Repositories, logs, issues, tests, documentation, commands, diffs, errors, pull requests. An agent can step into any system that produces signals a machine can read.

The second is that talent is expensive. If a tool saves real hours for highly paid people, there should be economic room to pay for inference. A monthly spend of hundreds of dollars per person can sound excessive, right up until you compare it to salaries, technical debt, delays, or projects that never get started for lack of time.

The third is that the feedback loop can be partly automated. Code compiles or it doesn’t. Tests pass or fail. The linter complains. An agent can iterate against external signals, not just linguistic intuition. That makes the act-observe-correct loop far more useful.

That’s why Claude Code, Codex, Copilot, Cursor, and similar tools aren’t just another category in the productivity market. They’re a large-scale experiment in how to put AI agents to work on jobs that produce real value. Coding AI stopped being autocomplete on steroids and became a way to delegate digital labor with measurable results.

There’s no small irony in the fact that the first software seriously threatening to automate part of our work is also threatening to break the classic economic model on which software itself was built.

The missing metric: cost per accepted change

If all we look at is tokens consumed, the analysis will be a poor one. Tokens explain the mechanism, but not the value. The important question isn’t how much AI we used, but what we got out of it.

It’s not enough to know how much a team spent on AI this month. We need to know what it cost to fix a bug, land a pull request, investigate an incident, generate a test that caught a real failure, or finish a migration someone had been putting off for months. The interesting metric will be something like inference cost per unit of useful work.

An agent that generates a lot of code doesn’t necessarily create a lot of software. It can generate more review, open more fronts, introduce technical debt, or make decisions nobody asked for yet. We shouldn’t just measure activity; we have to evaluate the outcome.

The flat rate let you avoid drawing those distinctions. With everything included, waste hid inside a general sense of productivity. Pay-as-you-go will be more uncomfortable, but also cruelly instructive. It will show which tasks make sense and which carry a cost that’s hard to defend now that someone is watching the invoice.

The sooner we tell spending apart from value, the sooner we’ll stop burning tokens on tasks that don’t deserve them. The sooner we learn to measure the cost of solving a problem, the better we can decide which problems are worth handing to agents at all.

Every organization will have to relearn how to measure the cost of its work. Not just the human cost, but the artificial one too.

From prompt engineering to cost engineering

There was a moment when the industry turned prompt engineering into a profession, a punchline, a hope, and a consulting practice, all at once. With agents, the discipline widens. Asking well isn’t enough. You have to ask with architecture.

The new skill will look more like cost engineering for agents. Less flashy than writing a brilliant prompt, but probably more useful.

You’ll have to pick the right model for each task, not just for its capabilities but also for its cost. An agent might need a powerful model to plan a delicate refactor, but not to sort files, rewrite imports, format JSON, or summarize terminal output. The real sophistication will lie in designing processes and systems that scale capability up or down according to risk, complexity, and expected value.

You’ll have to treat context as a scarce resource. Handing the model the entire repository, the entire history, and all the documentation can feel prudent, but it’s often just architectural laziness. Relevant context is valuable. Indiscriminate context is billable noise.

You’ll have to set budgets per session, per task, and per project. An autonomous task should know when to stop, summarize its progress, and ask for permission. It’s no longer just about safety; it’s about saving money. A system that doesn’t know how to stop isn’t autonomous.

And you’ll have to watch cost as naturally as we watch latency, errors, or CPU usage. Teams will need to know which prompts cause loops, which tools return too much text, which repositories are most expensive for agents, and which kinds of tasks blow up the bill without delivering enough value.

All of these changes could have an upside. When context costs money, clarity pays you back. Clean APIs, reliable tests, useful documentation, and observable systems will make both humans and agents work better. The old disciplined craftsmanship we’d written off in record time could return, not out of professional romanticism, but because the cost meter punishes chaos.

What changes for small teams and large companies

For startups and small teams, coding agents are an enormous lever. They let you build faster, maintain internal tools, automate tasks that used to require hiring or waiting, and turn intent into a prototype at a new kind of speed.

But variable cost introduces fragility. The same tool that lets you move faster can create a spending structure that’s hard to absorb if every critical workflow depends on expensive models. A startup may discover that its expanded team of agents doesn’t draw a salary, but it can still run up a token bill that’s hard to justify against the value produced.

In large companies, the main problem will be political. Who decides which use of AI deserves a budget? Engineering? Finance? Each business unit? A center of excellence with a long name and weekly meetings?

Shared credits, per-team limits, and per-user budgets are only the first administrative interface. Behind them will come internal cost models, spend dashboards, approval policies for extraordinary expenses, ROI audits, and debates about why one team needs more artificial reasoning this quarter.

A visible meter doesn’t just measure; it changes behavior too. Delegating to agents will remain normal; operating them efficiently will be the differentiator.

Measured intelligence also measures power

The idea of buying intelligence with meters whose digits make you nervous isn’t only a matter of corporate budgets. It has social consequences.

If the best models, the longest contexts, the most capable agents, and the deepest integrations cost more, then access to that expanded cognitive capacity will be unequal. Not just between those who have the tool and those who don’t, but between those who can afford premium reasoning and those who make do with the cheap, capped, or already-exhausted version.

In education, that can mean tutors of wildly different quality. At work, employees with different capabilities. Among companies, which ones get to compete. Among countries, technological dependence on the providers that control the models, the prices, the policies, and the availability.

The criticism of the electricity metaphor lands here. Electricity powers tools. Artificial intelligence is starting to take part in decisions. If it becomes basic infrastructure, its governance matters. Who measures, who charges, who audits, who gets access, and who can walk away to another provider are not secondary questions.

The flat rate didn’t solve any of this either. It just hid it behind a friendly, falsely democratizing fee.

An uncomfortable maturity

There’s no point romanticizing the past, either. The flat rate was useful, fun, and probably necessary for millions of people to discover what they could do with AI. But it was also a distortion. It got us used to never looking at the cost of processes that always cost far more than what we were paying.

Pay-as-you-go will bring friction and discord. Many users will feel that something already theirs has been taken away, even though it was really an extended promotion. AI agents won’t become less important because we have to measure them. The opposite: we’ll measure them because they finally matter.

Maybe that’s the clearest sign that AI has stopped being an experiment and become an integral part of how we work. This isn’t the end of AI; it’s the end of the era of AI as a toy. And that’s precisely why the numbers now have to add up.

Sources

GitHub Copilot is moving to usage-based billing(github.blog)

--no-rollback