Ungoverned Context Is a Real Supply Chain Risk for Agentic Workflows
Ungoverned context is a real supply chain risk for agentic workflows. Most teams can identify the agent's runtime but have no way to reconstruct what sources actually shaped the output.
As enterprises move into the next phase of Agentic AI for software development, a significant and often overlooked risk is how agents gather and inherit context during a workflow. Context poisoning can go undetected, particularly as advanced models with larger context windows and vendor tools make information ingestion easier.
A trend I have long advocated is for platform engineering teams to own Agentic workflows the same way they own CI/CD pipelines today. These teams already build and maintain automation, integrate tools, standardize processes, and reduce complexity for developers. Extending this model to AI, platform teams can connect relevant data sources and create reusable workflows for development teams. These 'golden pathways' are governed, opinionated, and ready to use. That model makes sense. The gap is that most organizations still do not govern what an agent is allowed to treat as authoritative context. Each integrated data source expands the context supply chain, and with it the risk of prompt injection, stale guidance, and hidden upstream transformation.
This challenge is particularly difficult to address because:
- Most companies do not have clear ownership over what an agent is allowed to treat as authoritative.
- Prompt and response logging are still immature, but even where auditability exists, it often stops at the final payload. Earlier processing steps can still obscure risk. OWASP refers to this as indirect prompt injection, but the practical attack surface is likely much broader than most teams assume.
- Context provenance is often incomplete, which makes it hard to determine where information actually came from once it has passed through several processing steps.
- Context is often assembled from multiple sources, but treated as one coherent input, which hides the lineage of each source.
Why Context Cannot Be Trusted by Default
Consider a simple SDLC workflow: an agent is created to validate a merge request review against a ticket for a bug in an external system. The agent is configured to access the ticket's content.
The ticket or a linked artifact, such as a runbook, could contain poisoned content. This can enter the workflow in several ways, listed here in roughly increasing detection difficulty:
Even with prompt and context auditing, the system cannot validate information it is unaware of. A workflow may log the final context payload and still miss the hidden retrieval, ranking, enrichment, or summarization steps that shaped it. Validity checks may incorrectly assume the returned context is the sole source of truth, even when upstream context is ungoverned.
This lack of ownership is frequently overlooked and extends beyond technical vulnerabilities, such as prompt attacks and compromised documentation. A common example is open-source projects used as libraries. A traditional supply-chain attack involves code tampering, which can be mitigated through security processes such as pinning to a specific SHA or using curated services. But if documentation contains a prompt injection and an AI agent uses it for context, those mitigations do not apply. Even if you trust the documentation source, harder-to-detect variations exist, such as prompt injections embedded in stale or outdated documentation that an agent still retrieves as relevant.
This is a critical gap because most teams can identify the agent's runtime, but few can specify the authoritative sources the agent relied on.
The False Sense of Confidence
In more mature software organizations, I have observed the implementation of audit gates to establish a clear record of what the agent consumed.
These organizations request the specific version of the internal API specification, the prompt, the model, and the tools used to gather context (such as agent.md files or MCP tools).
This looks thorough on the surface. However, when examining context gathering, they can usually specify only the retrieval method. They cannot provide the exact documentation used or identify the upstream processing steps that may have affected the payload.
Currently, Agentic workflow provenance is often treated like a traditional BOM, which collects metadata about the build process. However, agentic workflows are not linear and require a different approach.
- Why this dependency? Was it selected based on relevance determined by the context retrieval logic or specified in the prompt criteria?
- Where did this implementation pattern come from? Did it come from an approved source or from exploratory retrieval?
- What standard is this following? Was it based on the current standard or a stale internal document?
- Was this based on the current API contract or an outdated internal doc? Is there a way to track this level of detail?
Not All Context Should Be Treated the Same
Most organizations have not explicitly distinguished between the types of context an agent can access, which makes it difficult to reduce risk.
You can have reasonably controlled and well-defined sources, such as API specifications, secure coding standards, deployment procedures, and data-handling rules, where ownership, versioning, and change control should be expected.
Then you have contextual operational sources, such as incident notes, postmortems, and temporary workarounds. These are probably fine, but they at least need to be time-bound and require attribution.
Last, you have the "human sources" that fill in the gaps, such as chat logs, ad hoc snippets, and personal notes. These may assist humans, but should not serve as authoritative inputs for production-impacting code.
The problem is that these agent retrieval systems blur all of this together. They surface what scores highest for relevance, not what has the strongest provenance.
As a result, a stale wiki page may outrank a reviewed API specification, and a workaround from an incident thread can inadvertently become policy. This is how non-malicious poisoned context can be accepted as truth.
One idea that comes to mind is to have a way to flag sources as "trusted" or "untrusted" and to have a way to track the provenance of the context, then use a gateway to rank, filter, and produce a curated context payload based on that before the context is provided to the agent.
This gateway could also be used to enforce policies around context usage, such as requiring human review for certain types of context or limiting the amount of context that can be provided to an agent. The idea here is that the gateway would be the single source of truth for what context is provided to the agent and be able to run across multiple agent instances to ensure consistency and allow for centralized policy enforcement.
The Wrong Fix Is Trying to Bless Everything
Some teams attempt to address this by creating a registry of approved knowledge sources with designated owners, versions, and mandatory traceability.
The intention is correct, but execution often falls short.
Attempting to govern every document, note, and operational detail creates a parallel documentation system that is difficult to maintain. Teams may stop updating it, guidance shifts back to tickets and chat threads, and the registry becomes outdated, creating a false sense of accuracy.
It is unnecessary to approve every piece of context. Instead, establish a contract specifying which sources may influence production-impacting changes in higher-risk workflows. Set flags for uncertainty to help human reviewers focus on critical tasks.
Building a Context Bill of Material (CxBOM)
For designated repositories, treat context as a supply chain input.
If an agent-assisted change affects payments, identity, customer data, pricing, access control, or other sensitive areas, the merge request should include sufficient information to reconstruct the context behind the change.
This process does not need to be complex; a minimal Context Bill of Materials is sufficient. Most of this information can be captured from the agent's execution logs and metadata. The key is tying the context back to the specific pipeline and the change set in the merge request.
- Which system did the context come from?
- Who owns it?
- What version was retrieved?
- What type of source was it: authoritative, operational, or ambient?
- Which pipeline run ties back to the change?
This is the minimum required to address questions during reviews, audits, and incidents. Without this, teams are forced to make assumptions after the fact.
Furthermore, it gives reviewers enough lineage to decide whether the change should be trusted in the first place.
Note: This discussion focuses on context consumption specifically. Code produced by the agent should also be traceable to the model, tools, and prompt that generated it, but that is a separate topic.
This Is Also Why Tool Evaluations Never Really End
When the context layer is unmanaged, tool evaluations become a matter of trust.
Evaluations must consider retrieval behavior, source handling, and provenance in addition to model quality. This often leads organizations to repeat evaluations without gaining confidence.
The trust boundary remains unstable because the context layer is still unmanaged.
A shared context contract stabilizes the trust boundary, allowing tool changes to become configuration decisions rather than requiring full re-validation.
How I Would Approach This for Rolling Out Agentic Workflows in Enterprises
Extend a platform engineering approach by identifying common workflows that development teams use and where AI can accelerate processes and improve developer experience. The key question is not only whether the agent is permitted, but whether the sources influencing the change are controlled, attributable, and reconstructible. By developing a context layer processing pipeline that enforces a contract for context sources, you can create a stable foundation for agent workflows.
Develop golden workflows that account for the AI agent (including tools, models, and context sources) and the workflow it follows (invocation, project policies, and review checkpoints) as a single entity. This approach helps prevent gaps between teams where risks may go unaddressed.
This is where I see enterprises building AI workflows struggle to produce meaningful outcomes. They fail to address the context layer properly, which creates a trust boundary that is unstable and difficult to manage at scale. Until the context supply chain is governed with the same rigor as the code supply chain, agentic workflows will remain difficult to trust.