McKinsey & Company – one of the most prestigious and well-resourced consulting firms on the planet – had its flagship AI platform, Lilli, comprehensively compromised. Not by a nation-state hacking group or a team of elite security researchers, but by an autonomous AI agent built by CodeWall.ai. The findings are a wake-up call for every organization deploying AI at scale.
A Classic Bug in a Modern System
The irony is sharp. The vulnerability at the heart of this breach was SQL injection – one of the oldest and most well-documented bug classes in software security. Lilli had been running in production for over two years, and McKinsey’s own internal scanners, including OWASP ZAP, never flagged the issue.
Here’s how it worked: CodeWall’s agent mapped the attack surface and discovered over 200 API endpoints publicly documented. While most required authentication, 22 did not. One of those unprotected endpoints wrote user search queries to a database. The query values were safely parameterized – standard practice – but the JSON keys (the field names) were concatenated directly into SQL.
This is the kind of subtle, easy-to-miss flaw that automated scanners routinely overlook. The agent, however, noticed that JSON keys were reflected verbatim in database error messages. From there, it ran fifteen blind iterations, each error message revealing a little more about the query structure, until live production data started flowing back.
The Scale of What Was Exposed
Once inside, the agent didn’t stop at the initial database access. The full scope of exposed data was staggering:
- 3.68 million RAG document chunks – McKinsey’s entire knowledge base feeding the AI, including decades of proprietary research, frameworks, and methodologies. These are the firm’s intellectual crown jewels.
- 95 system prompts and AI model configurations across 12 model types, revealing exactly how Lilli was instructed to behave, what guardrails were in place, and the full model stack.
- 1.1 million files and 217,000 agent messages flowing through external AI APIs, including over 266,000 OpenAI vector stores.
- Cross-user data access – by chaining the SQL injection with an IDOR vulnerability, the agent could read individual employees’ search histories, revealing what people were actively working on.
The Real Danger – Compromising the Prompt Layer
Reading data is bad. But the most alarming finding was that the SQL injection wasn’t read-only.
Lilli’s system prompts – the instructions controlling how the AI behaves – were stored in the same database the agent had compromised. An attacker with write access could have rewritten those prompts silently. No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call.
The implications for 43,000 McKinsey consultants relying on Lilli for client work are severe:
- Poisoned advice – subtly altering financial models, strategic recommendations, or risk assessments that consultants would trust because it came from their own internal tool.
- Data exfiltration via output – instructing the AI to embed confidential information into its responses, which users might then copy into client-facing documents.
- Guardrail removal – stripping safety instructions so the AI would disclose internal data or follow injected instructions from document content.
- Silent persistence – unlike a compromised server, a modified prompt leaves no log trail, no file changes, no process anomalies. The AI just starts behaving differently, and nobody notices until the damage is done.
As CodeWall puts it: “AI prompts are the new Crown Jewel assets.”
Why This Should Concern Every Organization
This wasn’t a scrappy startup with three engineers. McKinsey has world-class technology teams, significant security investment, and the resources to do things properly. If they can miss a SQL injection vulnerability that’s been sitting in production for two years, so can anyone.
The deeper lesson is about the AI security blind spot that most organizations share. Companies have spent decades securing their code, their servers, and their supply chains. But the prompt layer – the instructions that govern how AI systems behave – is the new high-value target, and almost nobody is treating it as one. Prompts are stored in databases, passed through APIs, and cached in config files. They rarely have access controls, version history, or integrity monitoring. Yet they control the output that employees trust, that clients receive, and that decisions are built on.
Source: https://codewall.ai/blog/how-we-hacked-mckinseys-ai-platform

Leave a comment