Lily Hack: AI Procurement Ignores Agent Realities

Nate B Jones2026-05-10go watch the original →

the gist

McKinsey's Lily platform exposed full read/write access via SQL injection on 22 of 200 unauthenticated endpoints to a $20 agent; root cause is excluding developers from procurement, as traditional SaaS buying fails for cross-system agent workflows.

Lily Incident Reveals Organizational Flaws

Codewall's autonomous agent spent $20 and two hours to gain full read and write access to McKinsey's Lily platform, which 70% of their 40,000 consultants use daily. The agent accessed tens of millions of chat messages, tens of thousands of user accounts, and all writable system prompts. The exploit used SQL injection, a vulnerability known since 1998. Lily had been in production for over two years. McKinsey patched it within an hour after Codewall's responsible disclosure on March 9th. Postmortems focused on authenticating endpoints and sanitizing inputs, but 22 of 200 endpoints lacked authentication, including writable production ones. This pattern signals engineering culture issues, not individual errors, because no organization lacks endpoint authentication knowledge.

Traditional Procurement Fails Agentic AI

Enterprise software procurement follows a fixed sequence: strategic decision, contract negotiation, security review, IT integration, then developers build. This works for bounded SaaS like Salesforce, where humans use screens as permissions models. Agents lack eyes; they query systems via code, crossing CRM, support tickets, contracts, usage data, transcripts, and wikis. Permissions must exist as tokens, roles, and scopes, all auditable across boundaries. Implementation details like authentication, permissions, audits, and token costs shape strategy viability. Excluding developers early commits to untested platforms, discovered six months later during production pushes.

Vendor Responses Signal Implementation Focus

Anthropic and OpenAI launched enterprise services embedding engineers in customer build rooms. SAP acquired Dreo and Prior Labs for unified data layers and tabular foundation models on business ledgers. Pinecone released Nexus to avoid agents rebuilding context per run. Salesforce shipped headless 360, exposing APIs, tools, and CLI since agents skip screens. ServiceNow opened Action Fabric for external agents to trigger governed workflows with identity and audit. These address agent reachability, permissions, workflows, audits, and costs; the model alone never sufficed.

Key Questions and Checklist for Stacks

Platforms must distinguish humans from agents: senior consultants get broad access, but agents need bounded scopes per task to avoid company-wide exposure. Audit trails must trace agent actions for regulators, not users. Controls require instant revocation from consoles. Defaults matter under pressure: 22 unauthenticated endpoints show weak team defaults. A six-question checklist covers agent delegation permissions, scale token costs, regulator-ready audits, and reversibility; available on the presenter's Substack for vendors or internal builds. Involve developers before signing to avoid Lily-like liabilities.