Conceptual visualization of AI and human labor economics

STRATEGIC INSIGHTS

The AI Headcount Trade-Off Was Built On Wrong Numbers

A deeper look into the hidden costs, consumption multipliers, and the reality of the agentic shift.

I n 2024, the business case for replacing headcount with AI looked almost too clean. User-facing costs were low, the capability was rising, and every boardroom had a slide showing the maths. Klarna showed its work loudly and publicly, replacing the equivalent of 700 customer service roles and pointing to AI as the replacement. The press published it. Investors approved. Other organisations quietly ran the same calculation. As it turns out, the problem was not the maths. It was the inputs.

Klarna's CEO, Sebastian Siemiatkowski, has since admitted that cost was an evaluation factor that was too predominant and that quality suffered as a result. The company is now rehiring human staff. The cost savings projected at the announcement did not materialise. Rehiring costs have exceeded the original savings estimate . What looked like a decisive strategic move has become the most documented cautionary tale in enterprise AI adoption.

The Swedish fintech player is not an outlier. It is the organisation that was honest about what most organisations are quietly discovering. None of this means AI-led automation is wrong. In many functions, it will be right. But the threshold for replacing headcount is higher than the threshold for augmenting work. Once people leave, the business case is no longer a software ROI calculation. It becomes an operating model bet.

700

Customer service roles initially replaced by Klarna AI

13x

Growth in average monthly token spend since Jan 2025

78%

IT leaders reporting unexpected consumption charges

The Inputs Were Always Incomplete.

Klarna's calculation wasn't uniquely flawed. Every business case built in that window shared the same blind spot: the AI they were pricing wasn't priced at cost. It was priced to acquire them. AI providers like OpenAI and Anthropic prioritised user scale over unit economics. Adoption was the metric. Revenue would follow.

But behind every cheap-feeling AI application was a pricing model that bore no relationship to true cost. Anthropic began moving away from flat-rate enterprise pricing toward per-token billing precisely because agentic usage had broken the flat-rate economics. What cost thousands of tokens per session in conversational use costs millions in agentic use.

Data visualization of token consumption growth

Per-token prices have actually fallen across the industry. Competition has driven costs down significantly . And here lies the real complication: the price per token is not the problem. Total consumption is. Since January 2025, average monthly AI token spend across enterprise customer bases has grown 13 times, not because the price went up, but because usage patterns changed fundamentally as organisations moved to agentic workflows.

Another complication arises through model upgrades rather than price changes. A recent tokeniser update from Anthropic left per-token prices unchanged but increased token consumption per request by up to 35%. Organisations that did not benchmark their workloads before migrating discovered this through their invoices rather than their release notes. This is the hidden cost increase: not a price rise, but a consumption multiplier dressed as a capability improvement.

The infrastructure cost pressure extends beyond tokens. As agentic AI shifts the ratio of CPUs to GPUs in data centres from 1:8 toward 1:1, server CPU prices have risen by as much as 20% since March 2026, with further increases expected. Yet, none of this appeared in any business case written before 2025. The invoice is arriving now.

These are different cost pressures, but they point in the same direction: AI costs are becoming less predictable just as organisations are making more permanent workforce decisions around them.

The cumulative effect is predictable. 78% of IT leaders report unexpected charges from consumption-based AI pricing models, with actual costs frequently exceeding initial estimates by 30 to 50%. This is not a technology problem. It is a modelling and governance problem. And it is the problem that most headcount replacement business cases were never designed to detect.

Where Does Your Organisation Actually Sit?

The Untested

Operating on flat-rate or early per-token pricing, using AI primarily for drafting and summarisation. The economics work because the headcount argument has not been stress-tested yet.

The Committed

Organisations that have moved to agentic adoption and made headcount decisions. Token consumption has exploded beyond projections, and budget overruns have arrived before governance.

The Exposed

Made headcount decisions on Untested economics but are operating at Committed consumption levels. Quality monitoring costs have arrived, and the replacement is performing below assumed thresholds.

Understanding which category your organisation sits in requires going back to the assumptions the original business cases were built on.

The Four Numbers Missing from the AI Replacement Business Case.

If you recognise your organisation in any of those three descriptions, here is what the correct cost calculation actually includes. Four inputs are consistently absent from AI replacement business cases. None of them are unknowable. All of them are uncomfortable.

1) Token consumption at agentic scale

Most organisations have the conversational baseline. Almost none have modelled what happens when AI stops answering questions and starts executing tasks. The gap between those two scenarios is not incremental. It is an order of magnitude. The organisations managing this well treat AI spend the same way mature cloud operations treat infrastructure: with dedicated FinOps discipline, guardrails against runaway consumption, and NPV calculations before any workflow automation is deployed.

Diagnostic question: Has the business case been rerun against actual agentic consumption data from the last 90 days? If not, the model is still pricing a different workflow. Pull your token logs from the last 90 days. Compare consumption before and after any agentic workflow was introduced. The gap is the number your business case was missing.

2) Quality monitoring and intervention costs

Agentic AI at scale produces errors and edge cases that conversational AI rarely encounters. Someone reviews the outputs, catches the failures, and intervenes on the complexity the model cannot handle. Most organisations discover this cost retrospectively rather than prospectively, through customer complaints or quality reviews rather than financial modelling. By the time it appears in the numbers, it has already appeared in the customer relationship.

Diagnostic question: What is the fully loaded cost of human review, escalation, and intervention on AI outputs?

"Klarna's CEO has since admitted that cost was an evaluation factor that was too predominant and that quality suffered as a result."
— SEBASTIAN SIEMIATKOWSKI, KLARNA CEO

3) The cost of unwinding a failed replacement

This is the input almost nobody includes because admitting the possibility of failure is politically uncomfortable in a boardroom. Recruitment, onboarding, and the reconstruction of institutional knowledge carry real costs. A reasonable proxy: calculate the full onboarding cost of replacing each eliminated role at current market rates, including time to productivity. If that number doesn't change the decision, the replacement is probably sound. If it does, the business case was never complete. Klarna's reversal shows why that cost needs to be modelled before the decision, not after it.

Diagnostic question: What would it cost to reverse the decision, rebuild the team, and recover lost institutional knowledge?

4) Model selection governance

This is the most immediately actionable component and the one most organisations are not managing. Routing simple tasks to lightweight models and reserving frontier models for genuinely complex reasoning is not a technical decision. It is a financial governance decision. A client in management consulting observed that organisations without systematic model selection governance are leaving 40 to 60% of their potential AI spend savings on the table. Variable token consumption without governance creates a cost line nobody can predict month to month. The governance problem extends beyond model selection: enterprise AI budgets are routinely consumed by ungoverned personal use and shadow AI that nobody is tracking. Access controls by function are not optional. They are the first line of cost defence.

Diagnostic question: Are tasks actively routed to the right model tier, or does the organisation default to the most powerful model available?

Five Immediate Actions

1. Rerun the Business Case

Rerun the case against two additional scenarios: agentic adoption and agentic at scale. If the original headcount decision was made against a conversational baseline, the economics are already outdated.

2. Pilot Before Commitment

Deploy AI replacement in one contained function before scaling. The real economics surface faster than any model predicts.

3. Build Token Governance

Train teams to match task complexity to model capability, restrict access by function, and treat AI consumption as a managed financial line item.

4. Benchmark Workloads

Benchmark workloads before any model migration. A model upgrade that leaves the price list unchanged can still increase the effective cost per request.

5. Model the Unwind Cost

Model the unwind cost explicitly. Every replacement case should include a scenario where the organisation reverses course. That calculation belongs in the board presentation, not in the risk register nobody reads.

The organisations navigating this period most effectively are not the ones with the most sophisticated AI. They are the ones that modelled the economics correctly before they made irreversible workforce decisions.

The checklist above takes an afternoon.

The conversation it prevents takes considerably longer.

Written by

Pedro Correia