AI Needs the Context Auth Was Built to Limit

Vercel's recent security incident made me wonder how these types of attacks might evolve as AI tools become more widely used.

Vercel said the incident began with the compromise of Context.ai, a third-party AI tool used by one of its employees. The attacker used that path to take over the employee's Google Workspace account and gain access to some Vercel environments and environment variables that were not marked sensitive. Vercel also warned that the compromised OAuth app could potentially affect hundreds of users across many organizations.

I think the underlying issue is that AI's need for context doesn't fit the model underneath. It reaches across the system boundaries auth was built to enforce. Modern auth models classify workloads into two rough patterns. One is bounded automation (e.g., CI, Terraform runs), where the workload's footprint could be described when the policy was written. The other is human-driven work, where the person is the integrator, filling gaps between tools. Both patterns let the auth model stay narrow because the context and the intended actions were both known before the policy was defined.

AI agents fit neither pattern because their usefulness depends on pulling in more context as they work (e.g., the repo, the issue, the runbook, the incident trail). At every step, more context tends to make the tool more useful to the person running it. That is why zero-trust thinking becomes harder to apply cleanly here. The instinct behind the principle is still right, but useful AI tools have to pull context from across your environment as they run, and that is not a pattern the existing model knows how to describe.

Incidents like this are a preview of what happens when systems that depend on broad context keep getting layered onto an auth model whose purpose was least-privilege software access.

Auth was built when context was knowable upfront

In the bounded automation case, zero trust and the modern authorization stack (OIDC, short-lived tokens, scoped service accounts, policy-as-code) matured alongside Terraform runs, CI/CD jobs, and service-to-service calls in microservices. These workloads share the trait that you can describe exactly what they need to do before they run. A Terraform plan needs provider credentials scoped to specific resources. A CI job needs certain registry pushes. The identity, action, and target are all known when the policy is written. This predictability makes fine-grained authorization functional. That is why approaches like SPIFFE, OIDC-based workload federation, and mesh-level authorization policies fit those environments so well. The workload is bounded at policy time, so authorization can be specific.

The other pattern is the human case, usually mediated through an authenticated user (SSO, OAuth, SAML). A person reads the issue, recalls the architectural decision from last quarter, pulls up the runbook in another tab, checks the incident channel, and mentally stitches it all together. The tools do not need broad reach because the person handles that part. Each system access is a separate OAuth handshake against a known human identity, with access scoped to what that system's policy allows. The auth model did not need to connect the systems because the human was the connection and checked independently at every hop. That is why least privilege felt correct and practical for so long. Access could be limited because the tools were not asked to do the synthesis.

AI agents break both patterns because they do the integration work humans used to do, but without per-system human checks. What they do is decided at runtime by the model, based on the task in front of it. You can write a policy for an agent, but any useful setting leads to two failure modes. If broad enough to let the agent do its job, the policy describes most of the environment. If narrow enough to constrain the action space, the agent hits boundaries on the work it should complete. The agent wants a large context window because it does not know in advance which parts of your environment will be relevant.

A code review agent that only sees the diff is not doing the work a human reviewer does. The reviewer also recalls prior incidents, design decisions behind the module, and the current state of dependencies, and is separately authorized to access each. If the agent is to do that work, it must reach the same sources. The per-hop human authorization that gated each step collapses into a single broad grant, and the bounded policy that described a workload's footprint has nothing to attach to.

Step through each pattern to see where the auth model fits and where it doesn't. Use arrow keys to navigate.

That is why reading the Vercel incident simply as a case of a tool being given too much access misses the lesson. The tool needed that access to be useful, and the authorization model had no clean way to express "you can have broad reach, but only when the task warrants it, and only to the context the task requires." That kind of policy assumes either a human in the loop at each hop or a workload bounded at policy time, and agents are neither. The grant was wide because anything narrower would have made the tool ineffective.

Why scope keeps expanding

Once both patterns stop holding, every AI tool rollout follows the same shape.

The first version of a code review agent ships with narrow access because that is what the review process was comfortable with, and it is underwhelming. Adding issue tracker integration lets the agent see what the change is supposed to do, and it improves. Wiring in prior MRs and incident history in the same service improves it again. Pulling in deploy telemetry and recent architecture decisions helps the tool produce reviews like a senior engineer. Each request is approved by someone who can see why broader access makes the tool more useful. None of the requests, alone, looks like a bad decision.

Broader access is what turns the tool from a demo into something teams actually use. You cannot "review your way out" of this with more discipline, because the pressure to expand scope comes from the tool doing its job better, not from anyone cutting corners. In a large enterprise, this happens in parallel across many teams, so no single reviewer sees the aggregate.

With AI tools, the blast radius is no longer in any single grant. It is in the accumulation of grants the tool picks up as it gets more useful. Even when each grant is scoped and short-lived, a tool that holds Drive, Jira, GitLab, and the identity provider together is a concentration of context that did not exist before. One compromise gives an attacker a synthesized view of the environment that most human users would never hold at once.

That invisibility erodes the discipline around these tools in small, ordinary ways. A grant goes in without a clean policy definition. A token's lifetime drifts upward because rotation breaks the tool. A review step is waived because the next scope request is labeled minor. A temporary test grant remains after the workflow ships.

What the auth stack was not built to surface is this aggregate.

What the next access model probably needs

If useful AI keeps pushing toward broader access, "review harder" won't be enough. We probably need a different model for granting access in the first place.

A few directions seem more promising than the standing broad-scope approach most companies are drifting toward today.

Mediated access. Instead of every AI tool connecting directly to every source system, context should be brokered through a control layer that sits between the model and the systems it reads from. I described something like this in a previous post as a context gateway, a place to classify, rank, and filter sources before anything reached the agent. The authorization argument points at the same layer. If you are already going to need a gateway to answer "what did the agent actually consume," it is also the natural place to enforce "what is the agent allowed to reach, for which task." At that point, provenance and authorization collapse into the same control plane. In practice, this means no AI tool gets a direct OAuth grant to Drive, Jira, GitHub, or the identity provider. It gets a grant to the gateway, and the gateway holds the real credentials, scoped with policy applied per workflow.

Short-lived access. Short-lived credentials are standard practice for bounded automation. STS-style token exchange, workload identity federation, CI runner tokens, scoped service accounts: the pattern is well understood. The gap is that AI tools rarely apply it to what matters most. The workflow runtime may be ephemeral, spun up per task and torn down afterward, but the OAuth grant the tool uses to reach Drive, Jira, GitHub, or the identity provider is typically standing and long-lived. That is backwards for this workload. A better model issues the context-access grant when the task starts, scopes it to what the task needs, and expires it when the task ends. For agents acting on a ticket, the ticket is the obvious anchor. The grant lives for the duration of that work, then goes away. This fits how agents are used in practice (minutes or hours, not months) and shrinks the blast radius of a compromised tool to tasks open at that moment.

Context-bound access. This may be the bigger shift. Today, most auth models focus on systems and scopes: can this tool read Drive, can it write to GitLab? The risk from AI is increasingly about which combinations of context can be assembled for a task, not which systems a tool can reach. The exposure lives in the combination. A tool that pulls together Drive, Jira, Slack, and the customer database into a single inference is a different class of risk, even when every individual grant looks reasonable. This suggests a policy model where access is bound to a task, time window, purpose, and sensitivity class, and the gateway decides at runtime which combinations are permitted. The policy responds to the state of the world when the request is made.

None of that is a finished design, but it is closer to what the problem actually requires than treating AI tools as ordinary SaaS integrations with slightly wider scopes.

Auth has to move upstream

Authorization used to live at the point of access. An agent calls an API, the API checks the token, and access is granted or denied based on whether the caller has rights to that resource. This works when the policy question is "Can this caller read this resource?" It stops working when the question is "should this agent be allowed to assemble these pieces of context together, for this task, right now?"

That question has to be answered before the agent ever calls any API, at the orchestration layer, above individual system boundaries. Individual system APIs have no way to see the shape of the assembly happening above them. Each one is judging a single hop against a single identity, with no visibility into the broader workflow the request is part of.

Traditional auth classification lived downstream, inside each system's policy against its own resources. AI tools push classification upstream, to the layer where context is being pulled together from multiple systems for a single inference. Many details remain open: who owns that classification inside the organization, what the policy language is built from (tasks, purposes, sensitivity classes, etc.), how it integrates with existing identity providers, how it gets versioned as systems and workflows change. Mediated access, task-bound grants, and context-bound policy all point to the same shift. I think as this rolls out at scale, the boundary worth watching sits above the individual systems where auth decisions currently live.