Did Klarna actually fire 700 people because of AI?

Not directly. Klarna stopped backfilling roles between 2022 and 2024 as attrition reduced headcount by roughly 700, and the AI assistant absorbed the volume those roles used to handle. CEO Sebastian Siemiatkowski reversed course publicly in May 2025 after customer satisfaction dropped, and the company is now rehiring.

Is the Klarna story an argument against AI agents?

No. It's an argument against using cost-savings as the primary success metric for a customer-facing agent. The agents that survive in 2026 are the ones measured on customer outcomes, with humans kept in the loop for escalation.

What does 'human in the loop' actually mean for a 2026 agent?

It means a defined escalation tier (sentiment-triggered, confidence-triggered, or topic-triggered) where the agent hands off to a human and the human's resolution feeds back into the agent's training data.

All articles

CommentaryAI agents

Klarna unwound its AI customer service: three lessons for any operator deploying agents in 2026

Klarna replaced 700 customer service jobs with an OpenAI-powered agent, then reversed course in 2025. Three lessons for operators and CTOs scoping their own AI agent builds in 2026.

May 6, 20268 min readby Neuralhewn

In May 2025, Klarna CEO Sebastian Siemiatkowski told Bloomberg that the company was rehiring human customer service agents, a year after he had publicly bragged that Klarna's OpenAI-powered assistant was doing the work of 700 people. A year later, in 2026, the case is still the most-cited cautionary tale in AI customer service. It's worth being precise about what actually went wrong, because most retellings get it backwards.

What actually happened at Klarna

The widely reported version of the Klarna story compresses two years of decisions into a single dramatic reversal. The reality is messier.

Between 2022 and 2024, Klarna stopped replacing customer service representatives who left, while volume was redirected to an AI assistant developed in partnership with OpenAI. By February 2024, Klarna's press release claimed the assistant handled 2.3 million conversations in its first month, did the work of 700 human agents, and resolved tickets in two minutes versus the 11 minutes humans took. The number that wasn't in that press release: customer satisfaction.

By spring 2025, Siemiatkowski admitted to Bloomberg that "cost unfortunately seems to have been a too predominant evaluation factor when organizing this." The company began piloting what it called an "Uber-style" workforce of remote part-time agents (students, parents, rural workers) to bring human capacity back online. The AI assistant didn't get switched off. It got demoted from a replacement to a triage layer.

That's the actual story: not "AI doesn't work for customer service," but "AI replacing customer service without keeping human escalation paths intact creates a worse product."

Lesson 1: The success metric you announce is the success metric you'll get

Metric class	Example metric	What it incentivizes
Cost-out	"Replaces N human agents"	Cutting headcount before measuring customer impact
Speed	"Resolves in 2 minutes vs 11"	Closing tickets fast, not solving problems
Volume	"Handled 2.3M chats"	Throughput regardless of resolution
Customer outcome	NPS, CSAT, retention, repeat-contact rate	Genuine quality
Resolution quality	First-contact resolution, escalation rate	Right answers, right path

Klarna's announced metrics in early 2024 were almost entirely from the top three rows. The metrics it had to scramble to recover by mid-2025 were almost entirely from the bottom two rows. This isn't a coincidence. It's a structural property of how teams optimize.

If you're scoping an AI customer service deployment in 2026, write down the customer-outcome metrics before you pick the technology, and require your vendor to report against them on day one. Cost savings without retention data is fiction.

Lesson 2: Replacement is the wrong default. Augmentation is the right default.

The 2024 Klarna positioning was a replacement story: the AI does what humans used to do, the humans go away. The 2026 default (backed by every credible enterprise AI report this year) is augmentation: the AI handles tier-zero, the humans handle escalation, and the routing logic between them is the actual product.

Three concrete patterns we see working in production:

Agent-first triage with sentiment-based escalation. The AI handles the conversation by default, but a sentiment classifier monitors tone in parallel. Frustration above a threshold instantly routes to a human, often without the customer needing to ask.
Confidence-based handoff. The agent estimates confidence in its own answer (modern LLMs do this surprisingly well when prompted explicitly), and below a threshold the conversation is silently routed to a human queue with full context already attached.
Topic-bounded automation. The agent handles known categories (order status, refund eligibility, password resets) and only those. Anything outside the bounded list is escalated, not improvised.

What none of these patterns does is fire the human team. The human team's job becomes the long tail of hard cases plus continuous correction of the agent's bad answers, which becomes training signal.

Lesson 3: A customer-service agent is a product, not a project

The single sentence that most distinguishes successful 2026 deployments from Klarna-style outcomes: the agent has an owner who is measured on customer outcomes.

Failed deployments have an owner who is measured on shipping the agent. The day it goes live the project is "done" and the team scatters. Six months later customer satisfaction has degraded silently and nobody noticed because nobody owned the steady-state metric.

Successful deployments treat the agent as an internal product with:

A weekly review cadence of escalation rate, confidence distribution, and CSAT.
A documented retraining loop: bad answers from last week become evaluation cases this week.
A rollback plan: if a metric drifts, the agent is dialed back in scope, not torn out.
A named owner who keeps that job for at least a year after launch.

Klarna's 2024 deployment had none of these structural commitments because it was framed as a cost program, and cost programs disband when the savings are booked. Anything you ship in 2026 that's customer-facing should be framed as a product program from day one.

Where the Klarna story is being misused in 2026

Two narratives are running on this case right now and both are wrong.

The first is the gleeful "AI doesn't work" narrative, mostly from people who never wanted AI to work in the first place. This isn't supported by Klarna's own actions. The AI assistant is still running, it's just no longer a sole channel.

The second is the dismissive "Klarna just executed badly" narrative from AI vendors who don't want their pipeline disturbed. This is technically true and substantively misleading. Klarna's execution was bad in the same ways most early enterprise AI deployments are bad. Treating the case as a one-off lets the next dozen Klarnas happen on schedule.

The honest read: AI customer service works, but only when the operating model around it changes. If you keep the org chart of a 2022 contact center and bolt an LLM into it, you get the Klarna outcome.

The deployment shape that ships well in 2026

Across the deployments we see working (in retail, in fintech, in B2B SaaS), the shape is consistent:

Tier 0: Agent handles 50–70% of volume on bounded topics with confidence gating.
Tier 1: Human handles ambiguous, emotional, or out-of-bounds conversations, with full agent context handed off.
Tier 2: Specialist team for compliance, refunds above threshold, account changes.
Continuous loop: Tier 1 corrections feed Tier 0 evaluation set weekly.

The agent handles more volume over time. Tier 1 headcount shrinks slowly and intentionally, measured against retention metrics, not as a cost target. Some companies end up with fewer customer service staff. Many end up with the same number doing higher-quality work on harder problems. Almost none end up with the dramatic 700-headcount-replacement story Klarna told in 2024.

That's not a failure of AI. That's what success looks like when the metrics are honest.

If you're scoping an agent build right now

A few practical filters before signing anything:

Insist that your vendor write down customer-outcome metrics before pricing the build.
Insist on a documented escalation logic and a documented confidence-gating policy.
Insist that the system can be dialed back in topic scope without rebuilding.
Insist on weekly metric reporting for the first six months.
Don't sign a deployment that's framed as a headcount program. Reframe it.

We've shipped agent builds that started in the customer service surface and quietly extended into operations, sales follow-up, and account management. The deployments that worked all started with a small bounded topic, a clear human escalation tier, and a customer-outcome metric. The ones that didn't all started with a cost-savings deck.

If you're trying to scope something specific and want a second opinion before you sign, or if you're staring at a Klarna-shaped roadmap and want help reshaping it, book a free 20-minute call. We don't bring a deck. We ask questions, look at your actual flow, and tell you whether the build you're considering is one you'll regret in 12 months.

Written by Neuralhewn · Engineering team

Neuralhewn is an engineer-led AI automation agency in Toronto, working worldwide. We build custom AI agents and automations as real code our clients own — so these guides come from production work, not theory.

About the team →Book a free call →

Klarna unwound its AI customer service: three lessons for any operator deploying agents in 2026

What actually happened at Klarna

Lesson 1: The success metric you announce is the success metric you'll get

Lesson 2: Replacement is the wrong default. Augmentation is the right default.

Lesson 3: A customer-service agent is a product, not a project

Where the Klarna story is being misused in 2026

The deployment shape that ships well in 2026

If you're scoping an agent build right now

Zapier alternatives for business: the honest comparison

Make vs n8n: which automation tool should you use?

Build vs buy automation: when to use a tool and when to build

Tell us about the workflow that's slowing your team down.