Skip to content
The Token Billing Crisis That Cancelled Claude Code at Microsoft
EconomicsMay 23, 20268 min read

The Token Billing Crisis That Cancelled Claude Code at Microsoft

Microsoft cancelled Claude Code for 100K+ devs by June 30. Token billing burned their annual AI budget. Uber hit the same wall. What every team needs to know.

Microsoft's Experiences + Devices division -- the teams building Windows, Microsoft 365, Teams, and Surface -- is shutting off Claude Code for all developers by June 30, 2026. The cause: token-based billing burned through their annual AI budget in months. Uber hit the same wall four months into the year. This is the pattern now.

I've watched this happen twice in the same month. Two enterprises, same tool, same outcome. The math isn't hard to see once you look at it -- but most teams never look until the invoice lands.

What actually happened at Microsoft with Claude Code?

Microsoft's Experiences + Devices division -- the group building Windows, Microsoft 365, Outlook, Teams, and Surface -- opened Claude Code access in December 2025, extending it not just to developers but to product managers and designers. Adoption was fast. By spring, the tool was popular enough that EVP Rajesh Jha had to make a call.

The call: shut it down by June 30, 2026 and migrate everyone to GitHub Copilot CLI. Jha framed it as "benchmarking-then-convergence" -- Microsoftran the experiment, got the data, and is standardizing. According to Windows Central and AI Weekly, the real driver was budget: token-based billing burned through the annual AI allocation in months, and June 30 is Microsoft's fiscal year close. Cleaning up the external Claude Code seats before the new fiscal year starts is accounting hygiene at scale.

The Microsoft-Anthropic relationship isn't over. The November 2025 Foundry agreement stays intact, and Claude models remain accessible through Copilot CLI. What's cancelled is the standalone Claude Code interface -- the one with the usage-based billing that finance couldn't predict.

Free Newsletter

Get the daily AI agent signal in your inbox.

One email, every morning. The builds, tools, and frontier research that matter — no fluff, no AI hype cycle noise.

Subscribe free

Why did Uber hit the same wall four months into 2026?

Uber adopted Claude Code in December 2025 -- the same month as Microsoft. By April, they'd burned through their entire 2026 AI budget with three quarters of the year remaining. Per Storyboard18, 84% of Uber engineers had become agentic coding users by March, and internal leaderboards ranked engineers on Claude Code activity. The company had incentivized the exact behavior driving the cost spike.

The scale of adoption was real: 95% of Uber engineers used AI tools every month by that point, and 70% of committed code was coming from those systems. That's not a prototype -- it's a full deployment running without a cost model underneath it. The usage leaderboards amplified consumption past any forecast the budget team had built.

Two companies, same tool, same month of adoption, same outcome. Different industries, different team sizes, different use cases -- same structural gap: the cost model didn't scale with the rollout.

Why does token billing catch enterprise teams off guard?

Token billing looks like SaaS pricing but doesn't behave like it. A flat 00/seat plan creates a predictable budget line. Token consumption creates exposure that scales with how intensely the tool gets used -- and agentic workflows are designed to use a lot of tokens.

The enterprise averages, per CloudZero's 2026 pricing analysis, sit at roughly 3 per developer per active day and 50-250 per developer per month. That's a developer doing mixed work -- code review, autocomplete, some generation. For developers running Claude Code as an agent platform -- multi-file edits, background task delegation, automated PR review -- costs run 00-2,000 per month. For 90% of users, per-active-day costs stay under 0. The other 10% don't.

The problem is that most enterprise budget models treat AI tools like SaaS: headcount times seat price. Token billing means the real number is headcount times usage intensity times session depth. The difference between the two models isn't 10% -- it can be 5x at the tail.

At the API layer, the numbers get concrete fast. Claude Opus 4.7 (now powering Claude Code's /fast mode as of v2.1.150) costs per million input tokens and 5 per million output tokens. A single complex agentic session consuming 500K input tokens and 200K output tokens costs .50. Run 100 developers at 20 agentic sessions per working day and you're at .5M/month -- which is close to what Peter Steinberger's 3-person team spent running 100 Codex agents for 30 days. His bill: ,305,088.81, covering 603 billion tokens across 7.6 million requests. OpenAI paid it -- he works there. Your CFO won't have that arrangement.

Want this built for your business?

Venti Scale builds AI automation systems for businesses that want results without the learning curve. One operator, AI-powered, full marketing stack.

See What We Build

What does the governance gap actually look like?

Claude Code ships with a /cost command that shows session-level spend per developer. The data stays local on each machine. There's no built-in centralized view of who's consuming what, which models are driving costs, or how spend breaks down across teams and projects. Enterprise admins can set per-user spend caps -- but that's a limit after the fact, not visibility before it.

The tools available today are good for individual accountability. They weren't built for fleet-wide cost governance. An AI gateway -- intercepting API traffic before it reaches the provider -- gives you centralized logging, per-team budget enforcement, and the model-level breakdown finance needs. Without that layer, you're reconciling token costs in arrears.

One signal worth noting: Claude Code v2.1.150 shipped today (May 23) with plugin token cost visibility -- before you run a plugin, Claude Code now shows a projected per-session token cost. It's a small feature in a 418-package plugin ecosystem. But it's the right direction: show the cost before the commit, not after. Anthropic sees this problem clearly. The tooling is catching up to it.

What every engineering team should do before the CFO asks

The teams getting hit are the ones that ran fast on adoption and didn't build the cost model at the same speed. Here's what I'd run through before the budget conversation happens at the executive level:

Measure what you're actually spending. Install ccusage on every developer machine and get real numbers, not estimates. If your enterprise plan includes admin dashboards, audit them weekly. You need a baseline before you can govern anything.

Find your heavy users. Agentic workflows cost 5-10x more than interactive use. In most teams, 20% of developers drive 80% of token consumption. Find that group. Understand what they're building. Some of it's high-leverage; some of it's habit.

Match model to task. Opus 4.7 at /5 per million tokens is for the hard problems. Sonnet 4.6 at /5 handles most engineering work. Haiku 4.5 at / is fast enough for autocomplete and simple lookups. Most Claude Code configurations default up. Setting team-level model defaults -- Sonnet for standard tasks, Opus only when complexity warrants it -- is the single highest-leverage cost reduction you can make without changing workflow.

Set per-user spend limits proactively. Enterprise admins can cap monthly spend per user. A 00/month default cap with a review process to raise it for power users is more sustainable than waiting for an overage alert. Build the ceiling before you need it.

Add an AI gateway at 50+ developers. Per-machine tools like ccusage work for small teams. At 50+ developers, you need centralized traffic interception, model-level cost breakdowns, and per-project budget enforcement. An AI gateway pays for itself in the first billing cycle where you catch an outlier running agents overnight.

FAQ

Did Microsoft cancel its entire Anthropic partnership?

No. Microsoft's broader Anthropic relationship -- including the November 2025 Foundry agreement and Copilot Cowork inside Microsoft 365 Copilot -- stays intact. Claude models remain accessible through GitHub Copilot CLI for affected developers. The cancellation is specifically for standalone Claude Code licenses in the Experiences + Devices division. The access continues; the billing model changes.

How much does Claude Code actually cost for an enterprise team?

Enterprise averages run 3 per developer per active day and 50-250 per developer per month for mixed usage. Heavy agentic users -- running multi-file background tasks and automated review pipelines -- cost 00-2,000 per month. Token billing means actual spend scales with usage intensity, not headcount. A team of 500 with 20% heavy users can run 00K-M/year depending on agentic adoption rate.

What spend controls does Claude Code offer for enterprise deployments?

Claude Code's /cost command shows session-level spend per developer, stored locally with no centralized aggregation. Enterprise admins can set per-user monthly spend caps. The v2.1.150 update (May 23) added plugin token cost preview before execution. For fleet-wide governance, an AI gateway intercepting API traffic is the standard approach. ccusage is the go-to per-machine monitoring tool for smaller teams.

Could this same billing crisis happen with Codex or Gemini CLI?

Yes. Any token-billed AI coding tool -- Codex, Gemini CLI, or direct API wrappers -- has the same cost exposure as Claude Code at scale. GitHub Copilot's flat per-seat model is why Microsoft is standardizing on it: cost predictability matters at enterprise budget cycles, even if peak capability is lower. The tradeoff is real. Most enterprise finance teams will take predictable over optimal.

Want this built for your business?

Venti Scale builds AI automation systems for businesses that want results without the learning curve. One operator, AI-powered, full marketing stack.

See What We Build
AI Agents First

The daily signal from the frontier of AI agents.

Join builders, founders, and researchers getting the sharpest one-email read on what's actually shipping in AI — every morning.

No spam — unsubscribe anytime