Whose Leak Is It? DLP When an AI Agent Holds Your OAuth Token
I keep four or five Claude agents running across my projects, and a couple of them are wired straight into the business systems I actually use — Xero for the books, HubSpot for the pipeline — over my own OAuth token. The day I connected the first one, I sat in front of the consent screen and made the call every operator makes: which scopes to grant.
I narrowed them by hand. The official Xero connector will ask for the lot if you let it — pages of scopes, payroll among them — so I trimmed the grant down to the handful an accounts agent could actually use, and only then let it connect.
And here is what I told myself while I did it, because it felt obviously true: the agent calls the same API my browser would, with my token. It can only ever see what I can see. If my Xero user is scoped to invoices and contacts, the agent is scoped to invoices and contacts. The boundary is the vendor's — Xero defines it, Xero enforces it — so data loss prevention is Xero's problem, not mine. I'd done the responsible thing and the rest was somebody else's RBAC.
Most of that holds up. The agent really does inherit my access boundary; the vendor really does own it. But "holds up" was doing an enormous amount of work in that sentence, and it took me walking back through my own configuration to see which part it was quietly carrying.
The question I'd actually answered
When I reasoned my way to "the vendor owns it," I'd answered a question about access: who is allowed to read this data? That answer was correct. What I had not answered — had not even noticed I was skipping — was a different question entirely: given that the agent can read it, where is it allowed to go next?
That second question is the whole of data loss prevention. Proofpoint defines DLP as "the combination of policies, technology, and processes that identify and prohibit unauthorized exposure, sharing, or transfer of sensitive data" (Proofpoint). Exposure, sharing, transfer. Every word is about what leaves. DLP was never an access discipline; it watches the exits — data in motion, the outbound channel, the moment information crosses a boundary it wasn't meant to cross.
A browser session blurred those two jobs into one. When I read a Xero report in my browser, the reading and the doing-something-with-it were the same human act, governed by the same human judgement. An agent splits them. It reads — that's access, the vendor's domain, working exactly as designed — and then it can transmit onward through a tool call, an outbound HTTP request, a created record. That second capability is egress, and it lives downstream of the token, on a stretch of the path the vendor was never positioned to see.
There's a sharper twist, and it comes from a vendor's own documentation. HubSpot's OAuth quickstart spells it out:
"Access tokens reflect the scopes requested from the app and do not reflect the permissions or limitations of what a user can do in their HubSpot account. For example, if a user has permissions to view only owned contacts but authorizes a request for the
crm.objects.contacts.readscope, the resulting access token can view all contacts in the account and not only those owned by the authorizing user."
That's verbatim from HubSpot's docs. Read it twice. A user who can only see their own deals in the HubSpot UI can mint a token that reads every deal in the account. So even my comfortable premise — "it can only see what I can see" — isn't reliably true. The token's reach and the user's reach are two independent systems. And the egress half? That isn't on the vendor's side of the line at all.
If that sounds like a definitional nicety, fair enough. Except it has already cost real companies real data.
The heist that broke nothing
In May 2025, Invariant Labs published an attack against the official GitHub MCP server, and it maps onto my setup so precisely that it's uncomfortable: a developer's own personal access token, an official MCP server, an AI assistant doing ordinary work.
Here's the shape of it. The developer's token can reach both a public repository and one or more private ones — entirely normal. An attacker opens an issue on the public repo, which anyone is allowed to do, and buries a prompt-injection payload in the issue text. Later the developer asks their assistant something unremarkable: have a look at the open issues. The agent calls the GitHub MCP server's list-issues tool, loads the malicious issue into its context, and the planted instructions take over. Using the same legitimate token, the agent reads files from the private repositories and then exfiltrates them by opening a pull request to the public repo — where the data is now freely readable by anyone.
Invariant's researchers walked away with real specifics about their test user: private repository names, a plan to relocate to South America, even a salary (Invariant Labs). No token was stolen. No tool was poisoned. No authentication broke. Every single call was authorised. And the line that should be pinned above every agent deployment is theirs:
"This is not a flaw in the GitHub MCP server code itself, but rather a fundamental architectural issue that must be addressed at the agent system level. This means that GitHub alone cannot resolve this vulnerability through server-side patches."
GitHub alone cannot fix it. Sit with that, because it dismantles the assumption I'd been resting on. The same pattern produced EchoLeak (CVE-2025-32711) against Microsoft 365 Copilot — a zero-click exfiltration Microsoft rated 9.3 critical, where a crafted email coaxed Copilot into packing internal data into an image URL the client auto-fetched to an attacker's server. The OAuth token was valid. The RBAC said Copilot could read those emails. None of it stopped the data leaving.
The thread tying both incidents together is the quiet one: the vendor's access layer saw nothing anomalous, because nothing was anomalous at the access layer. The read was authorised. The outbound action was authorised. The breach lived in the gap between them — the gap I'd handed entirely to the vendor in my head.
Why no patch closes it
The reason this can't simply be patched away has a name. Simon Willison called it the lethal trifecta in June 2025: an agent that combines access to private data, exposure to untrusted content, and the ability to externally communicate is exploitable, full stop. As he put it, "If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker" (Simon Willison).
My Xero agent has all three by design. It reads private financial data — that's the job. It ingests untrusted content the moment it touches an invoice PDF, a supplier email, a web page. And it can make outbound calls. Three legs, every one of them load-bearing, none removable without removing the point of the agent. Willison names the protocol directly:
"The problem with Model Context Protocol—MCP—is that it encourages users to mix and match tools from different sources that can do different things. Many of those tools provide access to your private data. Many more of them—often the same tools in fact—provide access to places that might host malicious instructions. And ways in which a tool might externally communicate in a way that could exfiltrate private data are almost limitless."
I want to be careful here, because the easy misreading is "so MCP is broken, don't use it." That's wrong, and I'm not going to pretend otherwise to make the point land harder. The MCP authorization spec is genuinely good. It mandates audience-bound tokens via RFC 8707, requiring clients to name the specific server a token is for and servers to reject any token not minted for them. It explicitly forbids token passthrough — "MCP servers MUST NOT accept any tokens that were not explicitly issued for the MCP server" (MCP Security Best Practices). It tells clients to follow least privilege and request only the scopes they need. The spec closes the access-layer holes with real rigour.
It cannot close the egress hole, because that hole is on the operator's side of the line, and so is the honest bad news. There is no clean technical fix waiting to be installed. Willison is blunt about the guardrail products being sold to plug it: they "almost always carry confident claims that they capture '95% of attacks' or similar... but in web application security 95% is very much a failing grade." Three years into the problem, OpenAI's own CISO, Dane Stuckey, says flatly that "prompt injection remains a frontier, unsolved security problem" (via Simon Willison). I'm not going to oversell a mitigation a frontier lab admits it hasn't solved.
So if the vendor can't fix it and there's no product that reliably will, the only useful question left is who owns which half. That one has an answer.
The split
Think about a contractor with a building pass. Facilities issues a pass for third-floor work; it opens the main door, the lifts, the third-floor offices — exactly what was approved. One afternoon the contractor needs a photocopier, and the only one is in the records room on two. The pass works there too, because the access policy was drawn broad, not because anyone specifically decided "contractors may enter records." He photographs an open filing cabinet and walks out. No alarm. The pass was legitimate, his presence authorised, and nothing at the door was watching what left in his pocket.
That's the whole problem in one image, and it sorts cleanly into three duties.
The vendor owns access. Honour the scopes, validate the token audience, and — the part that's actively improving — ship granular scopes so the pass opens fewer rooms. Xero is mid-migration to exactly this: as of March 2026 newly created apps must use granular scopes, with broad legacy scopes deprecated. Xero's Grace Benedek Rooney frames the rationale precisely: "Instead of broad access, your app now requests only the exact permissions it needs. This transparency gives users more control during the OAuth 2.0 flow, making it clear how their data is used" (Xero Developer Blog). My opening instinct was right here. This genuinely is the vendor's job, and the good vendors are doing it.
The operator owns egress. This is the half my original model skipped, and it's mine. It starts with the one lever I fully control: least-privilege token issuance. The official Xero MCP server discloses its bearer-token scope list, and it's broad — roughly twenty distinct scope strings spanning invoices, payments, bank transactions, manual journals, reports, contacts, settings, and payroll (payroll.settings, payroll.employees, payroll.timesheets). The README's own Claude Desktop example, meanwhile, narrows the set to three: accounting.invoices accounting.contacts accounting.settings (XeroAPI/xero-mcp-server). Twenty versus three. An agent that reconciles invoices has no business holding a token that can read employee payroll, and the difference between those two configurations is one environment variable I set.
That split isn't hypothetical for me. One agent reconciles expenses and keeps the bookkeeping straight in Xero; another works the HubSpot pipeline, drafting the follow-ups I'd otherwise forget. I gave each a token scoped to its own job and nothing more — the bookkeeping agent can't see the CRM, the CRM agent can't see the ledger. Tidy. And still beside the point: the expense agent reads every invoice I hand it and writes a summary somewhere I can read it back. The reading is legitimate, the scope is tight, and the data has still left the building. That movement is the half I own.
OWASP puts the duty exactly where the building-pass analogy does. Its 2025 guidance on Excessive Agency (LLM06:2025) tells operators to "track user authorization and security scope to ensure actions taken on behalf of a user are executed on downstream systems in the context of that specific user, and with the minimum privileges necessary" (OWASP). Minimum privilege, user context, downstream — that's the operator's checklist, not the vendor's.
But I won't pretend scope minimisation is a solution. It shrinks the blast radius; it does not close the trifecta. A three-scope agent that reads an invoice still reads private data, still ingests untrusted content, still can communicate outward. Narrowing scopes makes a successful exfiltration smaller, not impossible. The rest of the operator's job — constraining outbound destinations, allow-listing where the agent may send, keeping a human in the loop on writes and sends — is real work in a domain where, as we just established, no tool does it reliably yet. Necessary, not sufficient. I'd rather say that plainly than sell a checklist that ends in a green tick.
Two of the three duties can be designed for. The third can't be delegated at all — and that's not an engineering claim. It's a legal one.
The part that doesn't split
You can delegate the work to an agent. You cannot delegate the accountability, and this isn't my opinion — it's written into the law on both sides of the world I work between.
Under GDPR, the controller — the party that determines the purposes and means of processing — "shall be responsible for, and be able to demonstrate compliance with" the regulation's principles (Article 5(2)). Article 24 puts the duty to implement "appropriate technical and organisational measures" directly on that controller (Article 24). And the European Data Protection Board is explicit that hiring a processor — signing the contract, outsourcing the handling — "cannot relieve the controller from its accountability" (EDPB Guidelines 07/2020). The accountability is assigned by law, not by contract, and not by architecture.
From where I sit in Singapore, the PDPA says it as plainly as anywhere on earth. An AI agent operating on my behalf is, in the statute's terms, close kin to a data intermediary — "an organisation which processes personal data on behalf of another organisation." And the law refuses to let the delegation launder responsibility:
"An organisation is responsible for personal data in its possession or under its control, including personal data that is in the possession of a data intermediary and processed by the data intermediary on behalf of the organisation."
That's section 11(3). Section 4(3) closes any remaining gap: the organisation "has the same obligation under this Act in respect of personal data processed on its behalf and for its purposes by a data intermediary as if the personal data were processed by the organisation itself." As if you'd processed it yourself. There's no door there.
What makes this bite for agents specifically is something Willison observed about the architecture, almost in passing: "an AI system cannot be held accountable for its actions" (Simon Willison). He trusts a friend with his logged-in browser only because social consequences exist if the trust is abused. The agent has no such hook. It cannot be liable. The vendor honoured its contract. So when the data leaves, accountability doesn't dissolve — it falls, by elimination and by statute, on the human who granted the token.
And the gap between that assignment and any actual governance is wide. Okta found that "while 91% of organizations deploy AI agents, only 10% have a management strategy" (Okta). The accountability has already landed. The governance, mostly, hasn't.
So, whose leak is it
Back to that consent screen, my own token, my own agent. The question "whose leak is it?" had felt, when I sat down, like a question about Xero. It was always a question about me.
The refined answer fits in a breath: the vendor owns the door, the operator owns the exit, and the name on the breach notice belongs to the human who connected the two. I was right that the agent only sees what I can see. I was wrong that this was the whole question — and being right about the first half is precisely what let me stop looking at the second.
The agent didn't take on any access I didn't already have. It took on an exit I'd never once had to think about.
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The 30 Principles for Agentic Engineering — Part 4: Governance and Safety
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept
AI-reviews-AI looks like a control. Under MAS, the EU AI Act, and any reasonable audit, it isn't. Here's why your compliance team won't accept it — and the compensating controls that actually work.
Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable
I publish Claude Code skills and install other people's. Then Snyk audited 3,984 public ones: 13.4% had critical vulnerabilities, 76 were confirmed malicious, and ClawHavoc is the scarier story underneath. Here's the supply-chain hygiene I now refuse to skip.