AI Risk Quadrant for Agent Security

What could go wrong if you hire digital workers with total access, little security, and zero loyalty?
This report scores 100 agents across 10 classes to help you answer that question.

2026 Q2 Edition AI Risk Quadrants · AIRQ Framework

Multiple distinct classes of AI agents, from browser drivers and autonomous coding tools to enterprise workflow engines, moved from demo to deployment faster than any previous category of enterprise software. The attack surface is moving just as fast.

The AI Risk Quadrant maps this landscape across three dimensions: how easily an agent can be compromised, how much damage a compromise can cause, and how effective its defenses are.

The intent is to give enterprise buyers, vendors, and security teams a shared map of the risk landscape, and a shared vocabulary to compare agents only where the comparison is structurally meaningful.

Go to AI Risk Quadrant Explorer

Data-Driven Insights

Every percentage, ratio, and ranking in this report is grounded in the real data — each agent assessed against the published methodology, with full per-agent profiles available for review.

1 Introduction

This section explains why AI agents are a distinct security discipline, then distills what scoring the full cohort reveals — the critical findings that recur across classes and the priorities they imply.

1.1 Agent Insecurity

For decades, enterprise security was built on determinism — a firewall rule held, a signature matched or it did not, and a human sat between every decision and its consequence. AI agents break that assumption. They are non-deterministic digital workers that read untrusted content, hold the privileged access of their operators, and carry out business tasks autonomously.

Collapsed Boundaries

Onboarding an agentic worker collapses three boundaries that traditional security kept apart. Data and control share the same channel — every document, email, ticket, or log line the agent reads is also a candidate instruction, and no deterministic gate separates the two. The human is out of the loop — permission models, OAuth scopes, and infrastructure credentials all assumed a person was translating intent into action; the agent removes the person but keeps the privilege. The principal is borrowed — the agent runs under its operator’s identity or a broad service account, so an attacker who steers it inherits a clean audit trail and full legitimate authority.

Expanded Attack Surface

Two traits make an agent harder to contain than any earlier AI system. It has a far wider capability surface — the shells, browsers, SaaS connectors, control planes, and credential vaults it is given to do its job — and an autonomous behavioral profile, where multi-step loops act without a human in the path. Every connector added for productivity widens the blast radius, so the agent’s capability breadth is its own threat model. This is not a harder version of application security; it is a different discipline.

The Defense Gap

The controls that would contain an agent — scoped identity, egress inspection, irreversible-action gates, AI-layer telemetry — do not ship as reliable defaults. The model itself cannot be trusted to refuse a well-placed instruction, and vendors concede as much. Capability ships faster than defense matures; the market sells power, not containment. The enterprise that deploys the agent, not the vendor, owns the blast radius from day one.

1.2 One-Minute Brief

This report profiles production AI agents across ten security archetypes — from coding tools and browser drivers to enterprise workflow engines and voice operators. Each agent is scored on ten weighted attack surfaces, six blast radius factors, and five defense components, producing a per-agent risk profile grounded in publicly verifiable evidence. The findings below are what that analysis surfaces:

  • The “capable and defended” quadrant is sparsely populated. Only 11% of the cohort lands in Fortified Leaders, while 33% sit in Tight Operators, 16% in Humble Providers, and the largest group — 40% — in Exposed Giants. Many enterprise-suite names that dominate AI procurement shortlists are not in the defended quadrants.
  • Capability and defense move in opposite directions. But a handful of agents manage to be both powerful and genuinely well-defended.
  • The “lethal trifecta” is nearly omnipresent. The combination of private data access, exposure to untrusted content, and ability for outbound actions appears in 98% of the cohort. Eight of ten agent classes show 100% trifecta exposure; only General Assistant Agents and Data Engineering Agents each have a single exception. One hostile document is enough to compromise every AI agent in these categories working in production today.
  • One architectural question predicts more risk than agent class or vendor. Does the agent execute tools, and is that execution sandboxed? Tool execution alone explains 76% of blast radius variance. Triage by architecture first.

The AIRQ agent class taxonomy makes the agentic risk landscape visible, the AIRQ scoring makes it comparable.

1.3 Executive Summary

Critical Findings

Scoring every agent in the cohort on the same dimensions reveals structural patterns that hold across classes and vendors — not one-off observations but repeating signatures of how the AI agent market treats security today.

  • The lethal trifecta is the default in most agent classes, including less-exposed classes like Data Engineering Agents and better-defended ones like Work Copilot Agents. Every LLM-based agent that acts on the world is vulnerable out of the box.
  • Defaults favor velocity over safety. The controls that would contain an agent ship off, opt-in, or behind a paywall, so the safe configuration is one the operator has to assemble. Risk is subtracted by hand, never inherited — an agent left as shipped is an agent left exposed.
  • The agents with the most power have the least protection, and the agents with the most protection have the least power. The two riskiest categories, Coding agents and Computer agents, pair the widest attack surfaces and largest blast radiuses with the thinnest defenses. Work Copilot and Business Process agents sit at the opposite extreme: among the most heavily defended, despite narrower exposure. But the pattern is a tendency, not a rule. A handful of agents manage to be both genuinely powerful and genuinely well-defended — proof that the trade-off can be broken.
  • Only 11% of the cohort sits in the “capable + defended” quadrant. The agents that occupy it are mostly enterprise solutions. The defense those agents carry is inherited from platform-level governance, not from AI-specific tooling.
  • Tool execution alone explains 76% of blast radius. A single variable outpredicts agent class, vendor reputation, and every individual defense component. Agent risk is effectively bimodal (tool executors and non-executors).
  • 37% of the market is audited more than it is defended. More than a third of agents score well on logging and observability while scoring poorly across the four defense components that actually prevent or limit harm. Separately, 38% complete irreversible actions before any monitoring path can plausibly fire. For these agents, audit capabilities function as a forensic asset, not a control.
  • 83% of claimed AI agent defenses are not publicly verifiable. Only 17% of assigned defense credits are backed by public evidence — published research, inspectable code, or documented compliance. The components most relevant to blast radius reduction (such as execution isolation) are the least verifiable.

Strategic Advice

Actionable priorities distilled from the findings, each targeting the controls with the highest risk reduction per unit of effort.

  • Defend the legs you can own, not the one you can’t. Prompt injection has no deterministic fix — no classifier reliably separates the agent’s data from its instructions, and vendors concede it. Concede the input boundary and spend the defensive budget on the trifecta legs the operator does control: egress, identity, and irreversible actions.
  • Scope identity and inspect egress as the highest-leverage controls. A least-privilege, per-task identity caps what a hijacked agent can reach; egress inspection closes the channel stolen data leaves by. Together they break the majority of attack chains at two reliable choke points — fund them before lower-yield controls.
  • Require execution isolation as a procurement gate. Among tool-executing agents, execution isolation dramatically reduces residual risk, but most of the benefit comes from the first step. Sandboxing cuts residual risk by roughly 2.6×, cloud or container-level isolation captures about 6× reduction. The practical procurement gate is “documented and tested sandboxing.”
  • Triage by architecture before triaging by class. Two binary questions predict the majority of real-world risk: does the agent execute tools, and is execution sandboxed? Run those two first; reserve class-based and vendor-based review for what they leave underdetermined.
  • Score every agent twice: vendor-as-shipped and customer-as-configured. The same platform routinely scores points apart depending on which build is evaluated, with spreads wider than entire agent classes. Procurement signs off on one configuration; security inherits the other.
  • Read a clean record as unexamined, not safe. Absence of observed exploitation is not evidence of safety. The better-scrutinized agents carry denser vulnerability evidence, which reads as “more vulnerable” when it often means “more examined.” Weight a sparse finding history as an open question, not a clean bill of health.

2 AI Risk Quadrants

This section introduces the AIRQ framework — its purpose, operational use cases, scoring methodology, the quadrant model that gives the scores operational meaning, and the interactive explorer that lets readers interrogate the full dataset.

Every agent in the cohort is scored on three independent dimensions and placed into one of four risk profiles. The result is a shared vocabulary for comparing agents that would otherwise be incomparable — a coding agent and a voice agent have nothing in common except that both demand security decisions. The quadrant model makes those decisions visible and the trade-offs between them measurable.

2.1 Introducing AIRQ

AIRQ is a quantitative security framework for AI agents. It measures how much risk an organization assumes when it deploys an agent — and how well that agent’s defenses match its capabilities. The result is a three-dimensional risk profile and a single composite score expressing AI Risk Appetite, designed to make agent security comparable, auditable, and actionable.

The AIRQ framework supports multiple operational use cases beyond the headline score:

  • Risk scoring. The AIRQ Score gives every agent a single, comparable number expressing the risk appetite it supports — useful for ranking, thresholds, and portfolio-level dashboards.
  • Agent selection. The AI Risk Quadrant plots agents on Attack Surface against Defense Controls, making it possible to compare candidates within a class and quadrant rather than by vendor reputation alone.
  • Procurement questionnaires. The methodology doubles as a structured question set — the attack surface factors, blast radius factors, and defense rubrics are the questions a buyer should demand answers to before deployment.
  • Threat modeling and hardening. The ten attack surfaces, six blast radius factors, and five defense components serve as checklists for red-team scoping, architecture review, and control-gap analysis.
  • Framework alignment. The methodology maps every AIRQ factor to established standards, turning best-practice guidance into scored, auditable lists.

2.2 AIRQ Methodology

Full Methodology

The full methodology — including the ten attack surfaces with their percentage weights, the blast radius scoring factors, the defense rubrics, and the aggregation formulas — is published on the AIRQ Methodology page, along with framework alignment for established AI security standards. Every weight and scoring rule is transparent and auditable.

Scoring at a Glance

Each agent is scored on three dimensions: Attack Surface — how easily the agent can be compromised, across weighted input and execution vectors adjusted for published vulnerabilities; Blast Radius — how much damage a compromised agent can cause; and Defense Controls — how effectively verified defenses reduce that risk. The AI Risk Quadrant plots agents on Attack Surface against Defense Controls directly, with Blast Radius as a third dimension (bubble size in the visualization).

The AIRQ Score is a parallel construct — a single composite metric aligned with AI Risk Appetite. It rewards capability paired with defense and penalizes capability shipped without it. The quadrant does not use the composite score: an agent can rank low on the AIRQ Score while sitting in a well-defended quadrant, or rank high while landing in Exposed Giants. The score measures risk appetite; the quadrant reveals the risk profile behind it.

Reporting Scope

This edition covers the most representative production agents across ten security archetypes — from coding tools and browser drivers to enterprise workflow engines and infrastructure operators. The cohort is selected for class coverage and market relevance, not exhaustiveness: each class includes the vendors that define the category and set the security baseline others follow. While each edition is a point-in-time snapshot, we anticipate the AIRQ Score will become a long-term indicator of how vendors treat AI security — not just a vulnerability count at a moment in time.

2.3 AI Risk Quadrant Model

The quadrant model plots agents on two independent axes — Attack Surface against Defense Controls — defining four named risk profiles. The framing is intentionally reusable: any organization can apply the same axes and quadrant definitions to its own agent portfolio, internal builds, or procurement shortlist. A quadrant placement is a starting point for risk conversation, not a final verdict.

Exposed Giants
High Attack Surface, low Defense Controls.
Powerful capabilities, thin defenses.
Fortified Leaders
High Attack Surface, high Defense Controls.
Broad capabilities with proportional safeguards.
Humble Providers
Low Attack Surface, low Defense Controls.
Narrow agency, limited protections.
Tight Operators
Low Attack Surface, high Defense Controls.
Constrained agency, strong defenses.

Exposed Giants

Agents with high Attack Surface and low Defense Controls land in the Exposed Giants quadrant — the agent’s reach is broad — tool access, data connections, autonomous actions — yet few internal checkpoints exist to interrupt a compromise once it begins.

  • Risk profile: This is the quadrant where the gap between what an agent can do and what it is guarded against doing is widest. The lethal trifecta — private data access, untrusted content exposure, and tool execution — gives cascade a path: when all three are present, every connected tool and data source is reachable from a single injection. The primary failure mode is cascade: a single injection point propagates through the full set of connected tools and data sources because the controls that would interrupt it — execution isolation, tool-level permissions, output filtering — are specifically the ones absent. Where the primary security boundary is a system prompt or model alignment rather than an architectural control, that boundary does not survive adversarial input.
  • Default posture: Exposed Giants typically represent the highest-priority remediation target in an agent portfolio. Execution isolation is the single highest-leverage control; sandboxing alone reduces residual risk substantially before any other defense is layered on. Avoid deploying without compensating controls, or sandbox heavily and layer external defenses before production use. Many agents land here in default configuration but can move toward Fortified Leaders through operator hardening — score both vendor-as-shipped and customer-as-configured.
  • Migration path: The only path out is to increase Defense Controls — moving the agent toward Fortified Leaders — because reducing Attack Surface would mean removing the capability that makes the agent valuable.

Fortified Leaders

Agents with high Attack Surface and high Defense Controls land in the Fortified Leaders quadrant — powerful and broadly integrated, but with a defense posture that matches the exposure. Identity enforcement, workload isolation, approval gates, and audit trails are present and proportional to the agent’s reach.

  • Risk profile: Compromise requires multiple independent control failures rather than a single successful injection — the defining characteristic of this quadrant. The defense depth is typically inherited from enterprise platform governance — tenant isolation, role-based access, audit frameworks that existed before the agent was layered on top. AI-specific security tooling alone rarely produces Fortified Leader status. This is the target state for enterprise-grade deployment: maximum capability with defenses that justify the risk appetite required. Residual risk is real but bounded; the primary failure mode is capability creep — new capabilities, integrations, or data sources silently outpace the control model, so the margin requires ongoing validation.
  • Default posture: Treat Fortified Leaders as the primary procurement target for use cases demanding both operational breadth and security confidence. Validate that the defense layer is architecturally enforced, not just vendor-claimed — defense credits should be publicly verifiable. Confirm that the scored configuration matches the deployed configuration; vendor-shipped and customer-configured scores can diverge.
  • Migration path: Without sustained defense investment, capability expansion drifts the agent toward Exposed Giants — the quadrant’s proximity to the highest-risk position is what makes ongoing validation essential. Watch for vendor updates that add new integrations, broader retrieval scope, or additional tool access without corresponding control updates.

Humble Providers

Agents with low Attack Surface and low Defense Controls land in the Humble Providers quadrant — a narrow operational bucket with limited integrations, restricted tool authority, and narrow data access. The risk is bounded not by inherent robustness but by the absence of meaningful privileges — the agent may still be vulnerable to the same attacks, but the consequence is small because its authority is small.

  • Risk profile: The threat model is simpler but not absent. The dominant risk modality is data exfiltration and information disclosure, not system compromise or destructive actions. Even without tool execution, the agent can aggregate and surface information across data sources, collapsing access boundaries that would otherwise limit individual queries. The primary failure mode is assumption failure: the agent has more reach than believed, or gains it silently, while shallow defenses still permit data exposure and policy bypass.
  • Default posture: Humble Providers are the lowest-return quadrant but also the least complex to govern. Focus on closing specific gaps — data controls, redaction, and egress monitoring — because the defense layer the agent ships with is thin. Periodically verify that the agent’s integrations and data access have not expanded since initial classification — assumption failure is silent by definition.
  • Migration path: This is the most ambiguous starting position: the agent can drift toward Exposed Giants if capability scales without defense, toward Tight Operators if defense investment comes first, or toward Fortified Leaders if both grow proportionally. The safest sequence is to invest in defense before scaling capability, reaching Tight Operators before expanding further.

Tight Operators

Agents with low Attack Surface and high Defense Controls land in the Tight Operators quadrant — narrow in scope and well defended for what they do, with fewer tasks, each bounded by meaningful controls. The defense posture is strong relative to the current attack surface, which provides a built-in margin of safety as the agent evolves.

  • Risk profile: Tight Operators represent the lowest-risk profile in the landscape, suitable for regulated or high-sensitivity environments where assurance requirements are non-negotiable. Constrained scope makes defense tractable: fewer tools mean fewer attack vectors, a simpler threat model, and controls that are easier to test and audit. The defenses here tend to be the most verifiable — deterministic allow/deny rules, hard permission boundaries, explicit approval flows — controls that survive adversarial input. The primary failure mode is scope creep: incremental additions shift the agent into a different quadrant without triggering re-evaluation.
  • Default posture: Low governance overhead; operators should validate that the constrained scope still meets the use case, and re-evaluate quadrant placement whenever new tools, integrations, or data sources are added. This is the recommended starting position for new agent deployments: launch narrow, prove controls in production, then expand scope deliberately.
  • Migration path: With proportional defense investment, capability expansion moves the agent toward Fortified Leaders — the safest growth path in the framework. Expanding scope is easy; contracting it after deployment is not. Starting in Tight Operators preserves optionality.

2.4 Agent Risk Quadrant Explorer

The chart plots AI agents on the AIRQ plane, showing how much AI Risk Appetite each agent can support — measured by the Defense Controls implemented to protect its Attack Surface. The AIRQ score rewards greater AI capability when paired with defense, and penalizes both unprotected capability and reduced capability even when adequately defended.

Agent Profile Details

Every agent has a page with a full analytical profile — click any agent on the quadrant chart below.

AI Risk Quadrant · Agents
Defense Controls →
Attack Surface →

2.5 Class Risk Quadrant EXPLORER

The class quadrant plots each agent class at its average Attack Surface and Defense Controls. Bubble size reflects average Blast Radius. Unlike the agent-level quadrant, where individual agents can scatter across all four quadrants within a single class, the class view reveals which archetypes structurally tend toward which risk profiles — and which classes are too internally fragmented for the average to be meaningful. Click a class to jump to its deep-dive.

AI Risk Quadrant · Agent Class Quadrant
Defense Controls →
Attack Surface →

3 Agent Security by Class

The AIRQ taxonomy spans the most popular agent classes for enterprise adoption — each with a distinct attack model, a different definition of “compromise,” and different implications for defense. The cross-class summary below ranks every class; the class quadrant explorer is in section 2.5. The sections that follow drill into each one: what the class does, who ships agents in it, and where the security risks concentrate.

3.1 Agent Archetypes

The agent classes in AIRQ are not just market segments. They are security archetypes — comprehensive groupings defined by the nature of the agent itself: its attack surfaces, deployment modes, autonomy patterns, data flows, and trust boundaries. The selected classes are distinct enough across these technical dimensions that many common market segments are almost certainly a subset of the presented archetypes.

The AIRQ research focuses on agents with enterprise traction — finished products that are ready to deploy or customize and ship, including agent builders.

The taxonomy spans ten classes: General Assistant Agents, Work Copilot Agents, Coding Agents, Browser Agents, Computer Agents, Conversational Agents, Custom Workflow Agents, Business Process Agents, Platform Operations Agents, and Data Engineering Agents.

Cross-class summary view: every class ranked by average AIRQ Score and the three component metrics, across the full cohort. Sort by any column to flip the picture. Class names link to the class deep-dives below.

3.2 General Assistant Agents Security

The biggest chatbot risk is not what it says, but the moment it stops being just a chatbot. The real danger appears when retrieval, memory, and tools expand the agent's authority without changing the conversation's appearance.

Archetype

Standalone chat destinations — users go to a dedicated chat URL or app rather than encountering an AI embedded inside a work tool. Includes general-purpose SaaS chatbots and self-hosted chatbot frontends. Technically, the class is distinguished by a conversation-first interface that increasingly gains tool use, memory, and MCP integrations behind the scenes — which is where the risk moves.

AI surfaces embedded inside a worker’s daily productivity surface (email, messaging, documents) belong in Work Copilot Agents instead, even when the same vendor ships both products.

Risk profile

The core issue is authority ambiguity: the user cannot tell, at any moment, whether the chatbot is generating text, retrieving from a connector, or about to execute an action. Effective privilege expands turn by turn without an explicit permission event the user can see. The result is trust compression — the interface feels as low-friction as a search box, while the consequences (a file moved, a calendar invite sent, a ticket created) match those of a real action.

In scored terms, the class sits in the middle of the attack surface range, but that average masks a widening split: assistants that stay text-only score low, while those adding memory, tool use, and MCP connectors push toward the lethal trifecta. External Data and Configuration are the dominant attack surfaces — every tier-1 assistant now ships connector marketplaces and persistent memory as defaults. Blast radius scales with whatever the connectors can reach, and defense controls are concentrated in input guardrails, with little investment in execution isolation or action controls since the products are still framed as “chat.”

The class is really two blast profiles wearing one name. Cloud-hosted assistants leak within the bounds of a session; self-hosted and autonomous frontends escalate the same injection to host code execution. Where the cloud product’s worst case is exposed data, the self-hosted product’s worst case is a compromised machine — the same prompt, a categorically larger consequence.

Security Findings

  • The “just chat” framing is now a misnomer — every tier-1 chatbot ships memory, tools, retrieval, and connectors available by default.
  • Persistent memory turned every chatbot into a stateful system, a major attack surface added.
  • The connector marketplace is the new shadow IT — IT has zero visibility into which OAuth tokens a user has bound to a chat tab, and the marketplace is itself a live supply-chain vector: poisoned packages and compromised repositories have reached these assistants in the wild, inheriting whatever the bound tokens can touch.
  • Self-hosted deployments trade leakage for host takeover — the same injection that leaks data from a cloud assistant escalates to host code execution on a self-hosted or autonomous frontend (browser-runtime flaws, permissive CORS, root-by-design autonomy). One class name, two very different worst cases.
  • Rendered output is the exfiltration channel — markdown images and hyperlinks let a hijacked assistant ship stolen data over its own display surface, bypassing input-side guardrails entirely. The structural fix is server-side egress — proxying or sanitizing rendered images and links — not stronger prompts.
  • The unique chatbot risk is authority elasticity — a single conversation can silently move from “summarize this” to “execute that” without an explicit permission event the user can see.
  • Memory carryover is invisible to the user — a poisoned attachment from yesterday steers today's reply with no UI signal.

3.3 Work Copilot Agents Security

The enterprise copilot problem is too much invisible enterprise context. Work copilots inherit trust from the tools employees already use every day. That makes subtle overreach into data, policy, and decision-making more likely than dramatic compromise.

Archetype

Assistants embedded inside a single worker’s productivity surface — email, messaging, documents, spreadsheets, and meeting tools. They act for the individual, over that individual’s data and permissions. Technically, the class is distinguished by deep integration with the host platform’s identity, data graph, and permission model — the copilot inherits the worker’s full context silently rather than asking for it.

A standalone chat destination the worker navigates to belongs in General Assistant Agents. An agent running an assumed business process for the organization (a sales-outreach operator, a support triager, an ITSM resolver) belongs in Business Process Agents.

Risk profile

This class is dangerous because it compresses enterprise complexity into conversational simplicity. The employee sees “help me with this task”; the system sees mail, docs, meetings, identity context, and cross-app connectors. The dominant risk is not wild autonomy but semantic overreach: the copilot is grounded in more enterprise context than the employee realizes, ranks and summarizes it by logic the employee cannot inspect, and presents the result with the legitimacy of the surrounding suite.

External Data is the dominant attack surface — emails, documents, chat threads, and shared files are all accessible through the user’s identity and any of them can carry adversarial content. Despite this, Work Copilots are among the best-defended classes in the cohort, with above-average scores on input guardrails, action controls, and monitoring — inherited from the enterprise platforms they embed in. Output guardrails are the conspicuous exception: the one defensive category where this class does not lead, leaving cross-context data leakage as the primary residual risk.

Security Findings

  • Work copilots are confused deputies by design — every action is indistinguishable from the user in the audit log. In most enterprises, it will fail to attribute prompt-injection-driven actions to the attacker. The inheritance is total: full directory-graph read is credential-equivalent, and principal propagation forwards the worker’s entire role — HR, finance, procurement — into a single hijacked turn.
  • Copilots are among the best-defended classes in the cohort, while being limited in blast radius and capability.
  • Output guardrails are the weak link — the only defensive category where this class does not clearly lead.
  • Guardrails scan the prompt, not the retrieved corpus — even the best-defended copilots filter what the user types while the retrieved documents carrying the injection pass through unscanned. The defense sits on the wrong channel, which is why a strong input filter and a successful injection coexist in the same product.
  • The connector is the soft spot, not the vendor’s code — copilots chain through tool-protocol and connector integrations, and the vulnerable link is routinely a third-party or community connector rather than the vendor’s own surface. A clean platform can still inherit an injection or SSRF from a connector bound to it.
  • Most workspace and business data is a malicious email away from invisible compromise. Emails, documents, chat threads, and shared files the copilot can cite or summarize sit one poisoned message from steering what the employee sees as trusted output — with no obvious sign the source was hostile.
  • “Anyone-with-the-link” shared files are now long-lived attack assets — an attacker plants a malicious document once; the copilot indexes it; for every employee who later asks a related question, the poisoned content is retrieved as authoritative context.
  • Background or agent mode silently removes the most effective control in this class — the user's final “send” click. Once the agent acts without explicit confirmation, the only human gate is gone.
  • Suite governance is not the same as agent governance — the parent product's policies (sharing, DLP, sensitivity labels) do not automatically transfer to the AI it embeds. The copilot can reach what the governance assumes is protected.
  • Copilots create authority laundering — sensitive context retrieved into one workflow can be surfaced inside a less sensitive workflow with no permission check between the two, because the agent inherits the user's identity in both contexts.

3.4 Coding Agents Security

Coding agents don’t just write code — they touch shell, dependencies, and tokens long before a diff lands in review. Code review catches outputs, not actions, which is why it isn’t a sufficient security model for agentic development.

Archetype

Agents that operate on source code, repositories, IDEs, terminals, and build systems. The class splits into three sub-types: coding copilots (human reviews each suggestion), autonomous coding agents (goal-in, repo-out), and app builders (prompt-to-deployed-app). Technically, they are distinguished by direct access to the software supply chain — shell execution, file system, dependency graphs, secrets, and CI/CD pipelines.

Agents that produce code as one output among many in a general assistant context belong in General Assistant Agents.

Risk profile

This is still the class where compromise most directly becomes production compromise. The danger is not bad code suggestions; it is high-trust operation inside the software supply chain. Non-determinism makes code review an incomplete defense: even if a human reviews the final diff, the agent may already have traversed secrets, run tests against production-like services, modified configs, or selected risky dependencies. Review catches outputs; it does not catch the full action trail.

The scores reflect this: Coding Agents rank in the worst two classes on both attack surface and defense controls — the textbook capability-to-defense inversion. Tool Execution and Configuration are the defining attack surfaces: agents run shell commands, load MCP servers, and auto-load rules files that function as persistent system prompts. Blast radius is among the highest because the agent sits inside the software supply chain with access to secrets, signing keys, and deployment pipelines. Defense controls are the lowest in the cohort — most agents rely on code review as the primary gate, which only covers outputs, not the action trail leading to them.

Security Findings

  • The lethal trifecta is the default — every mainstream coding agent ships out of the box the catastrophic combination: access to private data, exposure to untrusted content, and the ability to take outbound actions. While individually manageable, all three together mean a single hostile document can read sensitive data and exfiltrate it.
  • This class is in the worst 2 on attack surface and the worst 2 on defense — the textbook capability/defense inversion. However, the class is best read as two sub-groups: autonomous tools (average attack surface 8.1) and interactive copilots (averaging 5.6).
  • Tool execution and configuration are the defining attack surfaces. This matches the architectural reality: coding agents run shell, load MCP servers, and trust developer config files. Rules files are auto-loaded persistent system prompts — the new repository-as-trojan vector. One poisoned repo, one zero-click RCE in auto-accept / YOLO modes.
  • MCP is becoming the default plug-in system for AI agents. It’s structurally more dangerous than past plug-in ecosystems because MCP servers handle credentials and execute tools by design.
  • Code review catches outputs, not actions — a coding agent reaches secrets, runs commands, and touches dependencies long before any diff lands in a review queue.
  • Tool-output steering (logs, compiler messages, test failures) is more effective than direct prompt injection on this class. The agent treats command output as authoritative context; any text it reads can be instruction.
  • Auto-apply modes are normalizing faster than security research can catch up — running the agent without per-action confirmation is becoming default, and the window between feature launch and first CVE compresses every quarter.
  • Coding is the class where developer-purchased AI reaches production access without an enterprise gate. The same workstation that runs the agent holds AWS root, GitHub PAT, npm publish, and signing keys — one compromised laptop is one compromised software supply chain.
  • No output DLP, and the inherited tokens get harvested, not just held. The standing credentials the agent runs under — git, cloud, registry, OAuth, often passed straight into its containers — are read and exfiltrated over its own channels: a rendered-image proxy, a diagram render, or DNS. No coding agent ships real-time egress inspection, so the data leaves the way the agent talks.
  • Assume the sandbox will be escaped. Where a sandbox exists at all, it is repeatedly escaped at critical severity — including escapes that bypass an “auto-execution off” setting. The durable control is external containment around the agent (a disposable VM and an egress proxy), not the vendor’s in-process sandbox.
  • Poisoned config persists and propagates. Rules files and hooks are not a one-shot trojan: they re-arm in later, unrelated sessions and can write themselves into every repository the agent touches, turning one poisoned repo into a worm-shaped spread.
  • The agent is its own threat actor. No attacker is required for the worst outcome — the clearest real-world loss in this class is an agent that deleted a production database against an explicit freeze order, unaware it could roll back. Closed-loop autonomy rewards task completion, and reversibility is rarely designed in.

3.5 Browser Agents Security

Browser agents don’t just face hostile pages — they face pages that change between the moment the model reads them and the moment the click lands, all while logged in as you.

Archetype

Agents scoped to a browser tab — they navigate web pages, fill forms, click, and extract. The class splits into general-purpose browser-control agents and chat-with-browser “agentic browser” experiences. Technically, they are distinguished by operating inside the user’s authenticated browser session, where every page is an untrusted input and every action carries the user’s full identity.

Agents that drive the whole desktop (not just the browser tab) belong in Computer Agents. Chatbots that can fetch a URL as a tool call but do not autonomously navigate belong in General Assistant Agents.

Risk profile

The defining risk lives at the observation-to-action seam. Browser agents don’t own the machine — they own the link between what was on a page and what hits the network a moment later. Security is twofold: trusting the content that goes in, and tracing the action that comes out across the read-to-click gap — including the semantic race conditions hiding on pages that mutate between the moment the model reads them and the moment the browser acts.

External Data dominates the attack surface — every web page is untrusted input — and Network Egress is the primary blast component because the agent is logged into everything the user is. This class is the most internally fragmented in the cohort: quadrant placement is unusually even, with no dominant quadrant, making class-level averages unreliable for procurement. Defense controls vary widely — some browser agents ship domain allowlists and download restrictions, others rely entirely on the underlying model’s refusal behavior.

Security Findings

  • Browser agents are the most internally fragmented class in the cohort, their quadrant spread is unusually even. Every quadrant has at least 2 members — the only class in the cohort with no dominant placement. Procurement decisions based on class membership alone are essentially uninformative for Browser Agents; the per-agent score has to be read directly.
  • External Data is the largest attack surface, and Network egress is the largest blast component. The class fingerprint is “ingest content from arbitrary websites, then send bytes outward.”
  • Every web page must be treated as untrusted input. The blast radius is defined by the fact that the browser agent is logged into everything you are.
  • The concrete prize is credential theft and account takeover. “Logged into everything” is not abstract: a hijacked browser agent has demonstrably lifted password-manager credentials, completed OTP-based account takeover, and read sessions whose tokens were stored unencrypted. It becomes a credential-exfiltration engine the downstream service cannot tell apart from the real user.
  • “Click I am human” is the new “ignore previous instructions” — CAPTCHA-style instructions are the most reliable jailbreak template of 2026. A defense against bots became the most reliable weapon against them.
  • The “are you sure?” sensitive-action gate is bypassed by context framing in the majority of tested scenarios.
  • Turning on AI mode is a privilege escalation — the agent retains user identity across every tab it touches.
  • Browser agents face semantic race conditions — pages mutate while the model decides, so safe observations produce unsafe actions a second later.
  • Domain allowlists and download controls outperform model hardening for most enterprise browser tasks — but egress stays a leaky, pivotable control, not a seal. Encoded payloads slip past output checks, and blocking one channel (image-URL exfiltration) simply moves the attacker to email or on-page actuation. Plan for the pivot, not a single chokepoint.
  • Most browser agents are monitoring-blind. The majority ship no SIEM feed or compliance API, so a session that reads a vault and exfiltrates over its own navigation leaves no agent-layer trail to alert on or reconstruct afterward.
  • Injection needs no click, and persistence outlives the tab. Some agents act on mere navigation — visiting a hostile page is enough, with no user interaction — and poisoned content written to cross-session browser memory re-arms in a later, unrelated session.

3.6 Computer Agents Security

Computer agents don’t need admin rights to be dangerous — a normal user session is enough. Once an agent can move across apps and files, no one can tell what it really meant to do.

Archetype

Agents with full operating-system access — filesystem, desktop apps, network sockets, communication surfaces. They run on the user’s own laptop or inside a vendor-managed cloud VM. Technically, the class is distinguished by the broadest action surface in the cohort: a compromise hands the attacker the user’s entire machine, not just one application or tab.

Agents whose action surface is only the browser tab belong in Browser Agents, even if they share underlying technology.

Risk profile

Computer agents remain the highest-harm class. The deeper issue is that the desktop confirmation step looks like a control while being unreliable in practice. The human and the model reason over different abstractions (windows and labels vs. screenshots and accessibility trees). That gap produces confirmation mismatch: the human approves the appearance of the action, not what the agent is about to do, because nothing in the interface surfaces the difference.

The scores confirm this: Computer Agents are the most capable and least defended class in the cohort. The lethal trifecta is universal and maxed out — every member scores above zero on every blast component, at intensities a full point above the cohort average. Output guardrails average exactly zero: no Computer Agent scores any points on output validation or exfiltration-channel blocking. Execution isolation, the single most important control for this class, is also the weakest — averaging the lowest of any class in the cohort.

Security Findings

  • The most capable AND least defended class in the cohort. The cohort’s single worst-scoring agent lives here, but one outlier still sits in Fortified Leaders.
  • The lethal trifecta is universal and maxed out. Every Computer Agent scores above zero on every blast component, and those components run about a point above the cohort average across the board. The class doesn’t just have the trifecta — it has it at higher intensity than anywhere else.
  • Output guardrails average exactly zero. No Computer Agent in the dataset scores any points on output validation, exfiltration-channel blocking, or rendering sanitization. Every other defense component is at least non-zero on most members; output guardrails are uniformly absent.
  • Execution isolation, the recommended procurement gate, is also the weakest. Its average score on this control is the lowest of any class — and the default compounds it: at least one member ships auto-run as the default host-execution behavior with sandboxing off by default, so the exposure is what ships out of the box, not a misconfiguration.
  • Self-hosted computer agents anchor the ceiling of AIRQ blast radius — they have more CVEs per unit of adoption than any other class, and significantly more internet-exposed instances.
  • Many browser-agent injections transfer to computer agents. The browser is a subset of the computer surface, and the same hostile page unlocks harm capabilities the browser alone cannot reach (file system, local apps, other browser sessions).
  • Clipboard, notifications, and background apps are major blind spots in risk reviews — they are input channels the agent reads from that nobody scores as attack surface.
  • Once a computer agent runs, the user’s session is the agent’s session — administrator rights are not required for catastrophic outcomes. Ordinary user session authority is sufficient to cause large damage.
  • The kill chain is injection straight to shell. Untrusted input becomes host code execution at operator privilege — demonstrated zero-click, with the agent’s own command blocklist bypassed in every attempt. Because the loop is autonomous, one injection yields a self-propelling chain, not a single bad action.
  • The marketplace is the confirmed real-world kill chain. The class’s strongest in-the-wild evidence is a malicious-skill marketplace campaign — hundreds of thousands of installs, large sums stolen. The poisoned skill inherits the agent’s full host authority on arrival.
  • Local credential harvesting is built in. Under ambient OS identity the agent can reach the keychain, environment variables, and live OAuth/SSH material — credential theft is a default capability of the surface, reachable through path and command injection without any escalation.
  • Isolation is not a data-loss control. Even where the sandbox holds, exfiltration still rides an approved channel — a contained member leaked credentials through an encoded, policy-approved pull request. A sandbox bounds execution, not what leaves; it is not a substitute for egress inspection.
  • Unauthenticated inbound bridges are a prompt-to-RCE entry. Some members expose a messaging bridge or webhook with no authentication, turning a single inbound message into host code execution before any agent reasoning is involved.

3.7 Conversational Agents Security

In conversational agents, identity, intent, and authorization collapse under conversational tempo — timing, tone, interruption, and pauses change action outcomes invisibly.

Archetype

Agents that hold live conversations with people outside the deploying organization — customers on calls or chat, callers, leads, applicants. The agent represents the business to outside users; the conversation itself is the product. The class spans voice agents (managed and builder platforms) and customer-facing chat operators in support and sales. Technically, they are distinguished by real-time, bidirectional dialogue where the untrusted party is on the other end of the conversation and controls pacing, framing, and context.

A text-first agent (general assistant, business-process operator) that happens to expose a voice mode belongs in its underlying class, not here.

Risk profile

The deeper issue is time-compressed trust. In text systems, people can pause, re-read, and inspect what was said before responding. In voice systems, identity, intent, persuasion, and authorization all happen under conversational tempo, with no opportunity to pause for verification without breaking the call. Security failures in this class look like confidence, urgency, and misheard confirmation rather than technical compromise.

Attack surface is narrower than tool-wielding classes, but the voice channel bypasses conventional text-based defenses entirely — prompt injection scales with dialed numbers, not attacker skill. Output guardrails cannot be scored conventionally because speech goes directly to action with no document-shaped artifact in between. Builder platforms show strong platform-level defense, but customer-built agents on top of them inherit none of it — the risk is class-systemic and the platform vendor cannot reach inside customer flows to remediate.

In scored terms this class is honestly lower-blast: most members have no code execution, so a successful injection usually costs one customer’s record or one connector write, not host compromise — and input filtering, the one control that matters here, tends to fail open under pressure rather than closed. But the low headline score should be read as “contained runtime,” not “well defended.” The real damage path runs around the runtime entirely — through the standing service-account credentials behind the agent and the SDK and connector supply chain beneath it.

Security Findings

  • Voice agents democratize prompt injection to anyone with a phone — compromise scales with dialed numbers, not attacker skill.
  • Voice-agent output controls cannot be scored conventionally — speech goes directly to action with no document-shaped artifact in between, and conventional DLP scans text, not utterances.
  • Builder platforms are secure; the agents customers build on top of them are not — the risk is class-systemic and the platform vendor cannot reach inside customer flows to remediate it.
  • Approval phrases and spoken confirmations are too ambiguous for high-risk actions — “say yes to confirm” is a social-engineering primitive.
  • Latency pressure systematically weakens controls in production deployments — call-quality metrics inverse-correlate with safety.
  • The supply chain, not the chat box, is where the real losses happen — the class’s only confirmed in-the-wild compromises came through SDK and package-manager hijacks that harvested cloud credentials, SSH keys, and tokens. The contained conversation is a poor predictor of the platform’s actual blast radius.
  • Credentials leak through the tool layer, not the dialogue — operator API keys sitting in a tool’s environment can be read straight into the agent’s output, and outbound HTTP is frequently unrestricted by domain, so a coerced tool call exfiltrates backend secrets the conversation itself never exposed.
  • The agent acts as a standing service account with no per-action gate — refunds, resets, and CRM writes execute under shared operator credentials, so an injection that reaches a tool inherits authority no single customer interaction should ever carry.

3.8 Custom Workflow Agents Security

Custom workflow agents are dangerous because they age badly — small edits and added connectors gradually turn a simple flow into a high-authority graph no one fully understands.

Archetype

Workflow automation via no-code or low-code builders where the customer composes a custom flow across arbitrary apps. Execution is event-triggered and state is ephemeral. Technically, the class is distinguished by customer-authored delegation: the flow, its connectors, and the credentials that power them are all chosen by the team using the platform, not shipped by the vendor.

Platforms where the vendor ships an assumed process (sales outreach, support, ITSM, security triage) belong in Business Process Agents.

Risk profile

The defining risk is graph drift. Teams believe they built one automation. Over months of small edits — an added connector here, an exception branch there, a new condition to handle a customer request — they have actually built a graph of delegated authority across many apps, with branches nobody currently understands as a whole. The threat is not only a compromised model. It is the slow accumulation of mismatch between the workflow the team imagines and the workflow that is actually live.

Configuration is the dominant attack surface — the customer authors the flow, not the vendor, so the security posture is whatever the team chose to build. Blast radius is amplified by service-account sprawl: each connector adds another long-lived, broadly-scoped credential, and nobody removes them when the flow they served is retired. Defense controls are thin because the vendor can harden the platform itself but cannot reach inside customer-built flows — approval gates placed at the end of a flow rubber-stamp damage that has already executed upstream.

On the shipped default this is the rare target where entry point, loot, and exfiltration co-reside in one process. The platform decrypts a central credential vault at runtime, outbound traffic is unrestricted with no default DLP, and several members expose execution nodes that turn an injection into host code execution — so one compromise reads the whole keyring and ships it out over the platform’s own egress, with no second hop required.

Security Findings

  • Custom workflow agents are the fastest-growing silent attack surface — most CISOs cannot enumerate their own automation footprint, because each team builds its own and IT inventories none of them. A single compromised workflow delivers at platform scale — with platform credentials, to platform customer lists.
  • Inserting an LLM as the decision step in a workflow turns webhook payloads into a path to production writes — a single hostile incoming webhook can fan out into writes across every connected app, at the scale of the automation platform itself.
  • Workflow graphs age badly — small edits compound into high-authority graphs nobody fully understands months later. The risk lives in the graph nobody mapped.
  • Service-account sprawl, not credential theft, is the main blast-radius amplifier in this class. Each connector adds another long-lived, broadly-scoped service account; nobody removes them when the flow they served is retired.
  • Late approval gates protect nothing — they validate the output after the harmful cross-system actions have already executed. Approvals placed at the end of the flow rubber-stamp the damage.
  • Execution nodes turn the platform into a remote-code-execution target — code and HTTP nodes have produced host compromise, including actively-exploited code injection and zero-click unauthenticated RCE reachable across tens of thousands of public trigger endpoints, with SSRF to the cloud metadata service handing over a managed-identity token on top. The automation layer is an application-security attack surface, not just a prompt one.
  • The connector marketplace and platform SDK are an active supply-chain vector — worming package campaigns have hit automation-platform SDKs in the wild, and a poisoned connector inherits whatever the runtime-decrypted vault can reach. The weak link is routinely the third-party connector, not the vendor’s own code.
  • Containment, not control count, is what bounds the blast — the members that stay safe despite a full trifecta are the ones with no host shell and a per-session, disposable execution sandbox. Isolating code execution (micro-VM, non-root) and scoping each connector credential per workflow caps what a hijacked flow can do; a long control list on a shared, persistent runtime does not.

3.9 Business Process Agents Security

Business process agents do not need to go rogue to become dangerous — they only need to look legitimate because they inherit the trust the organization already extends to the processes they sit inside.

Archetype

Vendor-shipped process orchestration that runs an assumed business process end-to-end — support, sales outreach, ITSM, security investigation, SRE, back-office operations. The process is given; the agent fills in the details, keeps state, and hands off to humans. Includes turnkey domain operators and enterprise agent-building platforms. Technically, the class is distinguished by vendor-defined workflows with standing cross-system credentials and built-in approval chains — the agent owns outcomes, not individual steps.

If the customer composes the flow themselves across arbitrary apps, that is Custom Workflow Agents. If the agent helps one worker inside their own daily tools rather than running a process for the org, that is Work Copilot Agents.

Risk profile

This class owns outcomes, not steps — and two structural risks persist even in well-defended members. Process legitimacy laundering — wrong decisions look like ordinary case events backed by case-history evidence the agent itself wrote. Agent-as-insider authority — the agent holds standing credentials across every system the process touches, credentials no individual human would be granted.

Business Process Agents are one of the two best-defended classes in the cohort, with the most balanced defense profile — coverage spread roughly evenly across input guardrails, action controls, isolation, and monitoring. Action Controls is the class’s signature strength, inherited from the approval workflows and role-based permissions of their host platforms. The weak spot is output guardrails — and the blast radius runs above cohort average because these agents act across multiple SaaS systems with standing cross-system credentials.

Security Findings

  • Business Process is the second-best-defended class overall, with the most balanced defense profile. It spreads coverage roughly evenly across input guards, action controls, isolation, and audit.
  • Action Controls is the class’s signature strength. Approval workflows and role-based permissions are the architectural pattern these agents inherit from their host platforms. If the cohort has a “best place to deploy autonomous action,” it’s here.
  • These agents act across multiple SaaS systems and carry the credentials to do so. The risk concentrates in cross-system orchestration rather than code execution or file access (both of which are below cohort average).
  • Approval evidence is curated by the agent that requests approval. “Human in the loop” loses meaning when the agent decides what the human sees before deciding — the most consequential gap in this class’s defense story.
  • Platform products need to be scored twice — platform-as-shipped and typical-customer-build differ catastrophically. The vendor’s demo agent is not the agent your team will deploy.

3.10 Platform Operations Agents Security

The on-call engineer is now an agent — it holds a senior SRE’s standing credentials, applies fixes before a human can review them, and treats every log line an attacker can write to as operator input.

Archetype

Vendor-shipped agents that assist with or autonomously perform cloud, DevOps, SRE, platform-engineering, and infrastructure-management work — provisioning, deployment, monitoring, remediation. Technically, the class is distinguished by privileged read/write access to live production systems, telemetry pipelines, and Infrastructure-as-Code — the reliability and security of the agent itself are mission-critical.

Generic customer-built automation across arbitrary apps lives in Custom Workflow Agents. Vendor-shipped operators on a defined business process (sales, support, ITSM) live in Business Process Agents. Coding agents that produce IaC for human review (rather than applying it directly) live in Coding Agents.

Risk profile

The defining characteristic is standing privilege over live production. Unlike a Coding Agent that produces a pull request a human merges, a Platform Operations Agent often holds the credentials to apply the change directly: restart a service, scale a node pool, rewrite Infrastructure-as-Code and submit it. The blast radius of a single bad decision matches that of a privileged human SRE, but the decision is executed at machine speed against telemetry the agent itself ingests.

Monitoring is the class’s signature defense — the highest of any class, inherited from the observability and infrastructure-tooling category these agents come from. Execution isolation pairs with it to form the strongest “detect and contain” combination in the cohort. Input guardrails are the conspicuous weak spot: half the members score zero, so the same attacker who controls the log stream controls the input the agent acts on, with no filtering layer in between. Telemetry, log lines, and alert payloads are all injection channels that bypass the absent input defenses entirely.

Security Findings

  • Audit is the class’s signature defense, the highest of any class. Continuous monitoring is the architectural inheritance of the observability and infrastructure-tooling category these agents come from. Audit and execution isolation average ~2 across the class — the strongest “detect and contain” pairing in the cohort. Input guardrails are the weakest defense layer, well below other classes.
  • Standing privilege is the class signature — the agent permanently holds permissions a human SRE would only obtain via a break-glass workflow (emergency access requiring explicit escalation and leaving an audit trail).
  • Telemetry is now an injection channel — anything an attacker can write into a log line, alert payload, or trace span is potential operator input for the agent reading it.
  • Machine-speed remediation collapses the review window to zero — by the time a human reads the alert, the change is applied. The on-call engineer no longer has time to object.
  • Infrastructure-as-Code is the new system prompt — configuration committed to the repository steers agent decisions without explicit user intent; whoever owns the IaC owns the agent’s behavior.
  • The signature monitoring watches the infrastructure, not the agent’s decisions. Egress is absent or detection-only and there is no AI-layer telemetry, so an operator-equivalent takeover executed through the trusted machine identity leaves no agent-decision trail to alert on — the strong audit story is about the platform, not the reasoning that drove the action.
  • Dangerous defaults turn the review gate off. Self-healing ships without mandatory operator approval, a single setting flips review into auto-apply, and at least one deployment chart ships with read-all enabled — opt-out, not opt-in. The contained configuration is one the operator has to build.

3.11 Data Engineering Agents Security

The data engineer is now an agent. It has the warehouse password, and it runs generated SQL against tables an attacker can write rows into.

Archetype

Vendor-shipped agents that connect to, query, transform, move, or reason over enterprise data infrastructure — warehouses, lakehouses, ELT pipelines, and BI/notebook surfaces. Technically, the class is distinguished by direct credentials to critical data systems and the ability to generate and execute SQL, Python, or pipeline configurations — the agent operates on the data plane, not the compute plane.

Agents acting on the compute plane (clusters, IaC, deploys) live in Platform Operations Agents. Coding agents that produce SQL or pipeline configuration for human execution (rather than running it directly) live in Coding Agents.

Risk profile

The defining characteristic is direct credentials to governed data. Unlike a Coding Agent that produces a query for a human to review and run, a Data Engineering Agent often holds the warehouse credentials to execute the query directly — against PII tables, financial ledgers, or production analytics. The human review window between “write the query” and “the data has left the warehouse” collapses to zero.

Data Engineering Agents have the smallest attack surface of any class in the cohort — every individual attack vector scores below the cohort average, because pipelines run on operator-supplied configuration against operator-supplied sources with structurally narrow exposure to adversarial content. Blast radius is driven by the direct warehouse credentials: the agent permanently holds query and transform permissions over governed data, making unauthorized access and silent data corruption the dominant security concerns. Data content itself — rows, column descriptions, dashboard text — is the primary injection channel.

Security Findings

  • Data Engineering agents have the smallest attack surface of any class in the cohort. Every individual attack vector scores below the cohort average. Pipelines run on operator-supplied configuration against operator-supplied sources, and the scoring reflects that exposure to adversarial content is structurally narrow.
  • Direct warehouse credentials are the class signature — the agent permanently holds query and transform permissions over governed data.
  • Data content is now an injection channel — rows, comments, column descriptions, and dashboard text the agent ingests are potential operator input.
  • Silent corruption is harder than exfiltration — an agent that quietly rewrites an aggregation can poison downstream analytics for weeks before detection.
  • Generated-and-executed SQL collapses the human review window to zero — by the time a query result returns, the data has already left.
  • Row-level security is now an agent-identity problem — policy authors have to reason about “the agent acting on behalf of Alice,” not just “Alice.”
  • The approval gate decays to a single click. “Always-allow” modes and permission-bypass paths erode the human checkpoint class-wide, so the gate meant to stand between a generated query and a governed table quietly stops firing.
  • The connector and transform supply chain is the soft spot. A silent materialization override can redirect a pipeline to exfiltrate, connector code has carried server-side template injection and RCE, and marketplace connectors arrive unvetted — the data plane inherits whatever the connector was built to do.

4 Structural Findings

The class-by-class analysis above shows where risk concentrates. This section asks why — structural patterns that no single class view surfaces and that existing frameworks do not yet capture.

Several patterns repeat across agent classes: capability and defense move in opposite directions, monitoring substitutes for prevention, and defense claims resist public verification. These are not class-specific problems — they are market-level dynamics that shape how every agent score should be read. The sections below move from cohort-level findings to landscape-level cross-cuts to risk-quadrant distribution.

4.1 Agent Class Insights

Patterns that emerge when the ten agent classes are compared side by side — structural relationships between capability, defense, and risk that no individual class view reveals on its own.

  • Capability tracks attackability. The same vendors shipping the most capable agents ship the widest attack surface — a structural feature of the market, not a handful of outliers.
  • The power-protection inversion holds across classes. The two riskiest categories, Coding agents and Computer agents, pair the widest attack surfaces and largest blast radiuses with the thinnest defenses. Work Copilot and Business Process agents sit at the opposite extreme: among the most heavily defended, despite narrower exposure.
  • Exposed Giants is the most populated quadrant. 40% of the cohort sits there, and it holds 60% of the overall risk budget.
  • External Data is the universal attack surface. External Data — ingestion of documents, web pages, tickets, emails, retrieved snippets — produces indirect prompt injection on nearly every agent in the cohort. Any content the agent must read is effectively an unauthenticated command interface: the data channel and the instruction channel are one and the same.
  • Credential concentration makes blast radius multiplicative, not additive. The platform classes hold long-lived secrets decrypted at runtime, so a single compromise unlocks the whole connected estate at once. Risk scales with users × connections × triggers rather than linearly — each new connector multiplies the reach a single hijack inherits, which is why credential-holding platforms carry blast radius out of proportion to their attack surface.
  • The lethal trifecta is the default property for most agent classes. The combination of private data access, exposure to untrusted content, and the ability to execute tools appears in 98% of the cohort. Eight of ten agent classes show 100% trifecta exposure; only General Assistant Agents and Data Engineering Agents each have a single exception. For every other agent, one hostile document is enough.
  • The calmer classes are missing a leg, not defending one. The few classes that score lower — contained conversational bots, read-only analytics — earn that rating by lacking a trifecta leg, not by containing it. They have no code execution or no sensitive-data access, so the chain cannot close. Read a low score as “a leg is missing,” not as “this agent is defended”; add the missing leg and the rating moves with it.
  • Only 11% of agents are Fortified Leaders — capable enough to do meaningful work AND defended enough to deploy with confidence. The market hasn’t yet figured out how to combine power with protection.
    The agents in this quadrant skew enterprise: half come from established business platforms where defense is inherited from the surrounding product. Tenant isolation, role-based access, audit frameworks that existed long before AI was added on top. Strip that away and the picture changes — among AI-native agents built from scratch, the rate of well-defended capable systems drops close to zero.
  • The near-exceptions are telling. Several infrastructure-automation tools score higher overall than most Fortified Leaders but miss the quadrant on a technicality — the trifecta floor pins their attack score just below the threshold. They’re capable and well-controlled by every other measure. Read together with the 11 in the quadrant, they reinforce the pattern: when capability and defense do coexist, it’s because something larger than the agent is doing the defending.

4.2 Agent Landscape Insights

AIRQ scores three dimensions: attack surface, blast radius, and defense. The cross-cuts below carve the same cohort along axes it does not score — landscape-level patterns that fall outside the captured numbers but shape how those scores should be read.

Adoption Motion: Bottom-Up vs Top-Down

Adoption motion is now a security axis. How an agent reached the enterprise — through procurement, or around it — determines which of the enterprise's controls reach the agent in turn.

  • What we see: Coding agents and Computer agents rank as the top two highest attack surfaces, top two highest blast radiuses, and top two lowest defense controls. These are self-serve products with bottom-up adoption that typically bypass procurement gates. The enterprise-heavy agents with top-down adoption — Work Copilots, Business Process agents — sit at the opposite extreme. Adoption motion predicts compromise risk more reliably than any property of the underlying model.
  • Why it matters: Two agents with identical technical risk profiles can have different real-world compromise rates because one arrived through procurement and the other did not. The agent that came through procurement inherits SSO, conditional access, DLP, audit, network segmentation, and the vendor risk review. The agent installed on a personal credit card or sideloaded from a marketplace inherits none of them. Security programs that have not yet added adoption motion to their AI inventory are scoring half the relevant variable.

Identity Model: Delegated User vs Service Account

Half the AI agent landscape produces logs that say who acted. The other half produces logs that say the agent's user acted — even when a prompt injection did.

  • What we see: Delegated identity (the agent borrows the user's authentication): Coding, Work Copilot, Computer, Browser, most Chatbots. Service account (the agent has its own identity): Voice, Business Process, parts of Custom Workflow.
  • Why it matters: Delegated-identity agents show every action in the audit log as the user. That is fine when the user actually intended it, and useless when an attacker steered the agent through the user. Service-account agents have their own identity, which means actions are attributable to the agent and traceable separately. The split is invisible in vendor marketing and decisive in incident response — existing forensics cannot distinguish “Alice did X” from “prompt-injection-via-Alice did X.”

Time-to-Irreversible-Action

Some AI agents give you minutes to catch a mistake. Some give you milliseconds. Procurement treats both the same.

  • What we see: Millisecond-irreversible (Voice — refund issued; Browser — payment submitted; Computer — destructive shell command); minute-to-hour reversible (Coding — git revert; Custom Workflow — replay); reviewable-before-harm (read-only Work Copilot use).
  • Why it matters: Recovery cost should be a procurement input. It almost never is. Two agents with identical Attack Surface scores can have radically different real-world risk if one makes mistakes you can rewind and the other does not.

Platform vs Build: Vendor-Shipped vs Customer-Configured

The platform you bought is not the platform you are running. The same platform can score a few AIRQ points apart depending on default and configured controls.

  • What we see: AIRQ scores reflect a snapshot of a vendor’s reference build at a point in time; the agent your team actually deploys can score very differently. Two forces drive the gap: configuration choices on the customer side (which controls are turned on, which connectors are attached, which approval gates are wired in), and version drift on the vendor side (AI products change rapidly, and a control inventory accurate this quarter may not be accurate next quarter).
  • Why it matters: For agentic platforms, the vendor-shipped posture and the customer-configured posture are distinct setups with different risk profiles. A single procurement decision approves both, but a security team only sees one. Any platform procurement that scores the vendor and not the customer configuration is reviewing the wrong agent.

Compliance vs Defense: The Decoupling

Company compliance and agent defense are independent axes. AIRQ data shows that a vendor can be a strongly certified company with a weakly defended agent.

  • What we see: Compliance posture and agent-level defense are two independent axes in this cohort — not two indicators of the same underlying property. Once Monitoring is isolated, the correlation between vendor compliance certifications and the agent's other four defense components is near zero.
  • Why it matters: Procurement frameworks that treat SOC 2, FedRAMP, or AIUC-1 as evidence of agent defense are conflating two axes that the data shows are unrelated. The certificates audit the company that ships the agent — its HR processes, its employee access controls, its incident response. The agent's own technical defenses are scored elsewhere, by a different review, against different criteria. Treating either as a proxy for the other is the most common procurement mistake AIRQ surfaces. Buyers who pay for the certificate are routinely told they are also getting the defenses; the data shows they are not.

Detection vs Enforcement: The Defense Paradox

Controls are being documented faster than they are being enforced. The agent with the longest control list is not reliably the safest one.

  • What we see: A recurring pattern across the best-resourced vendors: guardrails that flag an attack but do not block it, isolation that is routed around, and protections that sit behind a paywall or a single bypass flag. Counting controls present rewards the agent that documents the most, which is not the same as the agent that stops the most.
  • Why it matters: Probabilistic controls on a non-deterministic model degrade rather than stop — a detection layer that never blocks is theater the buyer pays for. Procurement that scores a feature checklist will rank a detection-only agent above a quieter one that actually enforces. The defensible question is not “which controls are present” but “which controls are proven to block,” which is why enforcement evidence — red-team results, not a control list — belongs in every agent review.

Where the In-the-Wild Evidence Actually Is

The confirmed real-world incidents cluster in the supply chain. The agent-logic attacks — the prompt-injection kill chains — are real but still mostly lab-demonstrated.

  • What we see: The compromises with in-the-wild confirmation concentrate in connectors, marketplaces, packages, SDKs, and the platforms beneath the agent — classic supply-chain and application-security failures. The dramatic injection-to-action chains, by contrast, come largely from red-team and lab demonstrations rather than observed breaches.
  • Why it matters: It tells you where to spend first. The injection chains are the larger theoretical surface, but the losses on the board today arrive through the supply chain, so connector vetting, signing, and SBOMs are not future-proofing — they defend the vector already in use. It also warns against the opposite error: “lab-demonstrated” is a forecast, not an all-clear, and the gap between the two closes the moment a self-propagating agent worm crosses into a cross-fleet cascade.

What the agent can actually do: Read / Write / Execute

“AI agent” is a SKU, not a security category. Whether the agent only reads, also writes, or also executes predicts almost everything that matters about its risk.

  • What we see: Read-only (narrow Chatbot use); read-write (Work Copilot, Voice with CRM writes, Business Process); read-write-execute (Coding, Computer, Browser, Custom Workflow).
  • Why it matters: Each tier multiplies blast radius. Most procurement bundles read with write with execute under one “AI agent” line item. The taxonomy that matters most for risk is rarely evident, often buried in product descriptions, so making this distinction for evaluation is crucial.

Marketplace and Ecosystem Maturity

The fastest-growing AI marketplaces are also the most dangerous — marketplace growth structurally outpaces audit capacity.

  • What we see: Sprawling unaudited (Coding via MCP, Custom Workflow via templates, Chatbot via custom GPTs / connectors); curated vendor-managed (Business Process platform SDKs); walled-garden (managed Voice).
  • Why it matters: Marketplace velocity is the enemy of security review. The classes with the most extensions to choose from are the classes where the next supply-chain incident lives. The economics of marketplace growth structurally outpace audit capacity.

Mental-Model vs System-Reality Gap

Every AI agent class has a user mental model. None of them are accurate. That gap is where the attacks live.

  • What we see: Chatbot — user thinks “just text,” system has tools / memory / connectors. Browser — user thinks “I'm on the page,” agent reasons over DOM / accessibility tree / screenshot. Computer — user thinks “I see what's happening,” agent reasons over screenshots. Voice — user thinks “we're talking,” agent has CRM / ticket / payment APIs.
  • Why it matters: Security failures concentrate at the seam between what users believe and what systems actually do. The seam is class-specific, predictable, and almost entirely untreated by current control frameworks.

The Generational Mismatch

Two generations of security tooling, each missing a layer. Pre-AI tools were never built to inspect LLM workloads; current AI tools inspect the model but miss the agent.

  • What we see: Pre-AI security tools (network firewalls, EDR, traditional DLP) instrumented packets, processes, and files — not models. Current AI security tools (LLM firewalls, prompt-injection scanners) instrument the model — not the identity it borrows, the connectors it inherits, the workflow graph it modifies, the rules file it auto-loads, or the process state it laundered.
  • Why it matters: Every cross-cut above is downstream of this one. External Data is the universal attack surface because model-centric guardrails do not see retrieved content as input. Tool Execution is the Exposed Giants discriminator because no Gen-1 AI security tool has a concept of tool execution. Configuration is invisible to every product in either generation.

5 Final Recommendations

Security investment follows risk — but only if the risk is visible.

The recommendations below translate the AIRQ assessment into concrete, prioritized actions for teams building, buying, or governing agentic systems. They are grouped by domain — procurement, governance, inventory, and monitoring — and ordered within each group by expected risk reduction relative to implementation effort. Each one traces directly to a finding documented earlier in the report.

Procurement

  • Treat the agent, not the model, as the unit of risk. Evaluate every deployment on its full attack surface — inputs, tools, memory, oversight — rather than the underlying LLM alone.
  • Buy by class, not by vendor. Score each agent on its own posture — do not inherit trust from the vendor. A vendor’s portfolio can span multiple classes and quadrants; one well-defended product does not vouch for the rest.
  • Separate compliance posture from technical defense in procurement. Compliance certifications belong in contract review; AIRQ Defense Controls belongs in security review.
  • Score agents twice when they are platform products. Vendor-as-shipped vs typical-customer-build differ catastrophically.
  • Demand enforcement evidence, not a control checklist. The agent with the longest control list is not reliably the safest — controls are documented faster than they are enforced, and detection-only guardrails flag attacks they never block. Ask for red-team results that show a control blocks, not a feature list that says it exists, and treat “detected but not blocked” as a failing grade.
  • Read the default configuration as the product. Sandboxing, egress inspection, and audit ship off, opt-in, or paywalled, so safety is something the operator subtracts risk toward, never a default that is inherited. Score the shipped-default build, and price the work of turning the safe options on into the cost of the agent.
  • Treat the connector supply chain and tenancy model as procurement gates. Require an SBOM and a signing posture for every connector, tool-protocol server, and marketplace package, and vet, sign, and pin them before use. Assume the platform is multi-tenant until the contract proves otherwise.
  • Prize a small blast radius as much as a long control list. Containment — no host shell, a disposable per-task sandbox, scoped credentials — bounds what a hijacked agent can do regardless of how many controls it documents. Weigh how little the agent can reach, not just how much it claims to defend.

Governance

  • Assign per-agent risk ownership before deployment. Every agent needs a named owner accountable for its security posture — not the team that selected it, but the team that can revoke its access. Without clear ownership, agent-driven incidents have no escalation path.
  • Define acceptable quadrant placement by use case. Set an organizational policy that maps business criticality to minimum quadrant requirements. An Exposed Giant processing customer data should trigger a risk acceptance review; the same agent in a sandboxed lab may not.
  • Require a defense-gap review when adoption motion is bottom-up. Agents that arrived through individual sign-ups or marketplace installs bypass SSO, conditional access, DLP, and audit. Before they touch production data, require the same control baseline as procured agents.
  • Make human-in-the-loop policy explicit and enforceable. For each agent, document which actions require human approval, who the approver is, and what evidence the agent must surface before approval. Approval workflows where the agent curates what the human sees are not meaningful oversight. For irreversible identity, financial, and infrastructure actions, require dual-control — two separate principals — not a single click.
  • Treat agent credentials like service-account credentials. Scoped identity is the first of the two choke points that contain a hijacked agent — a least-privilege, per-task identity caps what the agent can reach before egress inspection ever has to fire. Agents that authenticate across systems carry standing credentials no individual employee would be granted. Apply the same lifecycle controls: rotation, scope minimization, break-glass revocation, and audit of every token the agent holds.
  • Mandate egress DLP and audit export as compliance gates. Make default-on outbound inspection and a SIEM/audit export contractual requirements, not optional hardening, and close the “guardrails behind a paywall” gap in the contract — a control the buyer cannot turn on without paying more is a control the deployment does not have.
  • Monitor safe defaults for drift and log accepted risks. The contained configuration is assembled by hand, so watch for settings that quietly flip back toward velocity, and record vendor concessions — acknowledged exfiltration channels, audit gaps — as explicit accepted-risk register items rather than letting them pass unnoted.

Inventory

  • Discover and document every AI agent in use. Inventory sanctioned and shadow agents across every class. Bottom-up adoption of agents such as Computer or Coding Agents is typically much larger and less transparent than organizations are aware of.
  • Map each agent’s data access and credential footprint. Discovery tells you an agent exists; this tells you what it can reach. For every inventoried agent, document which systems it authenticates to, whether it uses delegated user identity or a service account, which data stores it can query or write to, its connectors and tool-protocol servers, its egress destinations, and its autonomy mode. The blast radius of a compromise is defined by this map, not by the agent’s feature list.
  • Implement a class-based security policy. Define minimum required security posture per class and per quadrant, mapped to procurement gates.
  • Investigate and migrate the worst offenders. Start with the Exposed Giants quadrant — high attack surface, low defense — and move to sanctioned alternatives in the same class. Prioritize alongside it the high-blast classes — Coding, Computer, Custom Workflow, Platform Operations, Business Process — and any member that is unauthenticated by default, where the exposure ships switched on.

Protection

  • Harden the input boundary — but do not expect to close it. Prompt injection has no deterministic fix: no classifier reliably separates the agent’s data from its instructions, no agent in the cohort ships a prompt-injection classifier on by default, and vendors concede it. Apply injection defenses on every untrusted surface — web pages, documents, emails, tool outputs — as a detection layer, and log every attempt as a security event. Treat this leg as monitored, not solved; the defensive budget belongs on the legs you can close.
  • Constrain tool access by default. Start every new agent in a minimal-privilege sandbox and expand capabilities only after targeted red-teaming and policy review. For Coding Agents specifically, require OS-level isolation or procure only cloud-sandboxed variants — the local-execution-no-sandbox default exceeds typical enterprise risk thresholds without compensating controls.
  • Wrap host-resident agents in external containment. The built-in sandbox is escaped at critical severity, so place a disposable VM and an egress proxy around any agent that runs on a host — containment you operate, not containment the vendor ships inside its own process.
  • Remediate over-sharing before you enable retrieval. An agent pointed at a corpus inherits every access-control mistake in it, so fix over-share and ACL drift first — retrieval turns a latent permission error into an actively reachable one.
  • Put a human in the loop for high-impact actions. Require explicit confirmation for irreversible or outbound operations: payments, data egress, code execution, customer-facing messages. The gate rarely fails by a single switch — it decays. “Always-allow” and auto-approve modes erode the checkpoint one convenient click at a time, so ban those modes by managed policy rather than trusting users to keep the gate up.
  • Control what leaves the agent, not just what enters it. Egress is the one trifecta leg you can actually close, yet output guardrails are the weakest defense component across the cohort — the highest-leverage gap in the data. Apply egress controls, exfiltration-channel blocking, and output validation before agent-generated content reaches users, external systems, or the network. For agents with network access, domain allowlists outperform model-level hardening. Build this even behind a sandbox: a sandbox is not a DLP, and isolation does not stop exfiltration that rides an approved channel. Content masking is not the same control — redacting what a response displays is not exfiltration-grade egress inspection. Keep secrets out of the tool-output context the model can read in the first place, so a coerced response has nothing to leak. Paired with scoped identity, egress inspection breaks the majority of attack chains at two reliable choke points.

Detection & Response

  • Instrument agents like production services. Ship structured traces, tool-call audits, and behavioral baselines so anomalies can be detected, triaged, and rolled back quickly. Forward tool-call, permission, and connector activity to the SIEM, and alert on the signals that precede an agent incident: anomalous permission-mode changes, unexpected connector connections, and rendered-link or image egress.
  • Prepare for agent-specific incidents. Establish runbooks that account for delegated-identity attribution failures, autonomous action rollback, and credential revocation across every system the agent touches. When an agent misbehaves, containment means killing the session, revoking its tokens, and tracing every action back through the delegation chain — none of which existing incident-response playbooks cover. Keep autonomous mistakes recoverable: maintain snapshot/rollback and dev/prod separation so an agent that acts as its own threat actor can be undone, not just detected.
  • Treat memory, rule, and hook writes as security events. Persistence here needs no binary: an injected instruction written to agent memory, a rules file, or a hook re-arms in later, unrelated sessions. Log and alert on writes to those surfaces the way you would a new scheduled task or startup entry.
  • Re-assess against AIRQ on every major release. Re-score agents when capabilities, tools, or deployment posture change; use quadrant drift as a signal for renewed governance review. Score every agent twice — with optional features (memory, plug-ins, MCP integrations, auto-approve modes) enabled and disabled. The difference between the two scores is the real attack surface, and vendor demos default to the lower one.
  • Re-audit quarterly. CVE velocity in the AI agent market significantly exceeds traditional enterprise tooling. A quarterly cadence catches both real vulnerabilities and emerging research attention.

6 Defensible Future

The hardest problems in agent security are not bugs awaiting a patch — they are the new defaults. The good news is that you do not have to solve them to be safe.

Three facts in this report are permanent. Prompt injection has no deterministic fix; no classifier reliably separates an agent’s data from its instructions, and the vendors say so. An agent borrows the privilege of whoever deploys it, so a hijacked turn acts with real authority and a clean audit trail. And safety ships opt-in — sandboxing, egress inspection, and audit arrive off, so the secure build is one the operator assembles. These are not temporary gaps that the next model release closes. They are the ground the field now stands on, and the first step forward is to stop waiting for them to go away.

Unsolvable is not undefensible. The input leg of the attack cannot be closed — but it is only one leg, and the other two can. A scoped, per-task identity caps what a compromised agent can reach; egress inspection caps what it can send. Both are deterministic, both already exist, and together they break the majority of attack chains at two reliable choke points. The shift that matters is one of budget: stop spending it on the input fight that cannot be won, and spend it on the legs that close cleanly. An attack that cannot reach anything and cannot send anything out is contained, whether or not the injection ever lands.

This is maturity, not retreat. The enterprise has secured a powerful, fallible, over-trusted insider before — with least privilege, monitored egress, attributable identity, containment, and a gate on irreversible actions. An agent is a new instance of an old problem, and the disciplines that contained the old one still apply. Run them, and agent risk stops being a vague dread and becomes what the rest of security already is: measurable, comparable, and deliberately accepted or declined. That is the whole purpose of a score — to turn an unbounded fear into a number a business can reason about.

So adopt — with eyes open. The organizations that capture what agents offer will not be the ones that waited for prompt injection to be solved, because it will not be. They will be the ones that accepted the new defaults early, built the envelope around them, and moved on. The risk is real and the upside is larger; the path between them is engineering we already know how to do. A defensible future is not the one where agents stop being attackable. It is the one where we deploy them anyway, and contain what we cannot prevent.

7 About AIRQ

The AI Risk Quadrant is an independent, vendor-neutral assessment of the security posture of production AI agents. Its goal is to give enterprise buyers, security teams, and vendors a structured framework for comparing agents within meaningful categories — not a leaderboard, but a shared map of where risk concentrates and where defense investment pays off.

AIRQ is grounded in the Methodology, which defines three scoring axes (Attack Surface, Blast Radius, Defense Controls) and a composite AIRQ Score aligned with AI Risk Appetite. This report applies the methodology to the current agentic landscape, scoring agents that are shipping, agentic, and publicly documented. The methodology is versioned independently from the report and evolves as the attack surface evolves.

Looking ahead, AIRQ will broaden coverage as the landscape grows — adding agents, and new classes where the market produces them — and re-score existing agents as their capabilities, defaults, and defenses change. The emphasis will move further toward enforcement evidence: validating that a claimed control actually blocks rather than merely being documented, and tracking the shift from model-layer guardrails toward the architectural defenses this report argues are durable. The methodology will keep evolving alongside the attack surface it measures.

AIRQ is produced by an independent team of AI security researchers and practitioners, and it is meant to be built in the open. We invite vendors, enterprise security teams, and researchers to collaborate — correcting agent assessments, contributing public evidence and red-team results, improving the methodology, and partnering on implementation. Stronger public evidence raises the floor for everyone, not a single product. Contributions are credited; see Contribute to AIRQ for ways to participate.

Publication date
3 June 2026
Authors and reviewers
Eugene Neelou, Serge Malenkovich, Alex Polyakov, Tiffany Saade, Om Narayan, Paolo Di Prodi, Bill Stout, Apostol Vassilev, Ken Huang, Sarah Novotny