AIRQ Methodology

The scoring model behind the AI Risk Quadrant — how agents are placed on the Attack Surface, Blast Radius, and Defense Controls axes, and how the quadrant is assigned.

1 AIRQ Introduction

The AIRQ framework produces two outputs for each agent: the AIRQ Score (a composite number expressing AI Risk Appetite) and the quadrant placement (a categorical position based on Attack Surface and Defense Controls). Both are introduced below; the per-axis calculations that feed them are defined under AIRQ Quantification.

1.1 Introduction

Map the security posture of production AI agents across three risk dimensions. The deliverable is a quadrant-style visualization reframed around security risk rather than market positioning:

  • Attack Surface: How easily the agent can be compromised
  • Blast Radius: How much damage a compromised agent can cause
  • Defense Controls: How effectively defensive controls reduce raw risk

The methodology surfaces whether agents with high capability necessarily carry high Attack Surface scores, or whether defense-in-depth can decouple the two. Agents that achieve high capability with low Attack Surface exposure identify the architectural patterns that work.

1.2 AIRQ Scoring

The AIRQ Score is the primary output metric of the framework — a single number that expresses how much AI Risk Appetite the agent supports: capability an organization can adopt when defense matches exposure. The score is aligned with higher appetite: it encourages greater capability when paired with defense, and penalizes unprotected capability, capability without matching defense, and reduced capability even when defenses are properly in place.

AIRQ Score = B × (A·D/7 + 5) / (A + 5)

Where A = Attack Surface, B = Blast Radius, D = Defense Controls.

Dimension Symbol Scale Security meaning
Attack Surface A 1 – 10 Attack vectors — number and severity of input factors an adversary can reach. Interacts with D: a liability when defense is inadequate (D < 7), neutral at adequate defense (D = 7), and an asset when defense exceeds the threshold (D > 7).
Blast Radius B 1 – 10 Post-exploitation impact — what a compromised agent has authority to touch. A and B are coupled in practice: A measures the attack vector (how you get in), B measures active exploitation (what you do once in), and defense controls span both phases — input controls guard vectors while isolation, action controls, and monitoring constrain realizable damage. The formula consolidates this interaction through A·D for parsimony; B scales the result as a linear multiplier representing the damage ceiling.
Defense Controls D 0 – 15 Security maturity — aggregate of D-01–D-05 factors (see Defense Controls). Gates whether attack surface helps or hurts: at D = 7 the score reflects Blast Radius (B) without attack-surface penalty or bonus; below 7, exposure drags the score down; above 7, strong defense lifts it.

The two constants in the formula (7 and 5) are the quadrant boundary thresholds, each derived from the scoring rubrics rather than from any desired cohort distribution — the resulting spread of agents across quadrants is observational, not engineered. 7 is the D threshold for adequate defense (average 1.4/3 per defense factor), and 5 is the A threshold for meaningful attack surface (derived from the Lethal Trifecta floor of 4.8). At D = 7 the A·D interaction term equals the denominator, so AIRQ = B — attack surface neither helps nor hurts. At D = 0 (no defense), the formula reduces to 5B/(A+5), penalizing exposure with no defense bonus. Practical score range is ~1–8 for the current cohort. A higher AIRQ Score means the agent supports a higher AI Risk Appetite.

For reproducibility, Attack Surface and Blast Radius are rounded to two decimal places before the composite AIRQ Score is computed; Defense Controls is an integer. The AIRQ Score is then rounded to two decimal places. This ensures the published headline number reproduces by hand from the published axis values.

1.3 AIRQ Quadrants

Agents are assigned to quadrants based on their Attack Surface (A) and Defense Controls (D) scores. The horizontal axis represents ambition (how much you chose to build), and the vertical axis represents discipline (how well you protected it):

Exposed Giants A ≥ 5 · D < 7 Fortified Leaders A ≥ 5 · D ≥ 7 Humble Providers A < 5 · D < 7 Tight Operators A < 5 · D ≥ 7 D = 7 A=5 Defense Controls → Attack Surface →
Quadrant A Range D Range Interpretation
Fortified Leaders ≥5 ≥7 Many features AND strong defense. Secure innovation — the aspirational quadrant.
Tight Operators <5 ≥7 Disciplined but limited feature set. Safe but not maximizing utility.
Exposed Giants ≥5 <7 Big feature footprint, inadequate defense. Features drag the score down.
Humble Providers <5 <7 Neither capable nor well-defended. Lowest risk posture by construction.

Both quadrant boundaries are grounded in the scoring rubrics — the ten attack surface factors, the six blast radius factors, and the five defense factors — not fitted to produce a particular distribution of agents across quadrants.

A = 5 (Attack Surface). Derived from the Lethal Trifecta floor (4.8): any agent with untrusted input, sensitive data access, and external egress scores at least A = 4.8. Crossing 5 means the agent has meaningful exposure beyond the trifecta minimum.

D = 7 (Defense Controls). An average of 1.4 out of 3 per defense factor — the minimum posture where more controls are implemented than absent. At this threshold the A·D interaction in the formula neutralizes attack surface: AIRQ reduces to B, and capability neither helps nor hurts.

“Fortified Leaders” is the aspirational quadrant — agents that ship many features AND invest in defense proportionally. In the formula, these are the agents where adding features actually improves the score (D ≥ 7).

2 AIRQ Quantification

All scores reflect the agent’s documented default configuration. Non-default controls (opt-in sandboxing, enterprise-tier lockdown features, self-hosted hardening) are noted in each agent’s profile but do not raise the score. This ensures the quadrant captures what operators inherit out-of-the-box, not what they could theoretically achieve.

2.1 Attack Surface

Attack Surface · range 1–10. Measures an agent's exposure to compromise through untrusted inputs and behavior subversion. Scored as a weighted aggregate across 10 attack surface factors.

10 Attack Surface Factors (per-factor score: 0–4)

# Factor Scope Weight
A-01 User Input Direct prompt injection vectors; input validation; instruction hierarchy 12%
A-02 External Data Channels accepting adversarial content (repos, emails, web, messages, files, MCP servers, marketplace) 14%
A-03 Memory Systems Persistent memory presence; cross-session poisoning; memory integrity verification 10%
A-04 Reasoning Module Goal manipulation resistance; reasoning chain transparency; alignment verification 8%
A-05 Planning Module Task decomposition exploitation; autonomous decision scope; plan validation 8%
A-06 Tool Execution Shell/code/API execution; file system scope; credential exposure; tool output validation 15%
A-07 Orchestration Workflow/pipeline manipulation; autonomous action authority; multi-step chaining 10%
A-08 Inter-Agent Agent-to-agent trust model; cascade propagation; identity verification between agents 8%
A-09 Output Processing Output validation bypass; exfiltration channels (markdown, images, URLs); rendering injection 7%
A-10 Configuration Config file trust model; auto-execution of config; supply chain integrity; plugin/MCP security 8%

Per-factor scoring

  • 0: Not applicable — factor does not exist in this agent's architecture
  • 1: Minimal exposure with strong controls in place
  • 2: Moderate exposure with some mitigations
  • 3: Significant exposure, exploitable with moderate attacker effort
  • 4: Severe exposure, trivially exploitable or demonstrated zero-click

Attack Surface score calculation (two-step)

Step 1: Evidence adjustment. Each factor's base score (0–4) is adjusted by an additive penalty reflecting the strength of published evidence:

Penalty Evidence Level Description
+0.0 None No published security research on this factor for this agent
+0.5 Theoretical Attack described in blog/paper but not demonstrated on this agent
+1.0 Demonstrated Attack demonstrated in controlled research environment
+1.5 CVE (moderate) Published CVE with CVSS < 7.0
+2.0 CVE (high) / Zero-Click Published CVE with CVSS ≥ 7.0 OR real-world incident documented OR zero-click demonstrated

Adjusted factor score = min(5, base_score + evidence_penalty).

Evidence penalties are applied only when the supporting evidence is agent-specific — a CVE filed against this agent, a vendor security advisory naming this product, or a red-team writeup demonstrating the attack on this exact build. Class-level evidence (academic surveys on the agent category, OWASP Top 10 for LLM entries, MITRE ATLAS tactic narratives) informs the architectural base band (0–4) but cannot justify a non-zero penalty on its own.

The additive penalty ensures that agents already scoring at the architectural maximum (base=4) still receive meaningful differentiation when confirmed exploitation exists — a base-4 factor with a demonstrated CVE (4+1.5=5.0) scores higher than a base-4 factor with no published research (4+0.0=4.0). This addresses the headroom problem inherent in multiplicative approaches, where worst-offender agents would otherwise be indistinguishable at the ceiling.

The evidence penalty only increases scores — absence of CVEs does not reduce the base architectural assessment. Agents with less security research retain their base scores; agents with confirmed exploitation are penalized.

Step 2: Weighted aggregation.

raw = Σ(adjusted_score[i] × weight[i]) for i in A-01..A-10
A = max(1, (raw / 5.0) × 10)

Where 5.0 is the maximum possible adjusted factor score (base 4 + maximum evidence penalty of 2.0, capped at 5). The result is scaled to 1–10 with a floor of 1, ensuring no agent scores below the minimum of the scale. The Lethal Trifecta floor check (see Score Calibration) is applied after this calculation.

Interpretation bands for Attack Surface scores are listed in the Score Calibration table.

2.2 Blast Radius

Blast Radius · range 1–10. Measures the potential damage a compromised agent can cause.

Scoring factors (weighted)

# Factor Scope Weight
B-01 Code execution capability Shell access, Python/JS execution, script running 20%
B-02 File system access scope Read/write/delete across file system vs. scoped to working directory 15%
B-03 Network access Unrestricted outbound vs. blocked-by-default vs. domain-allowlisted 20%
B-04 Credential access Access to API keys, SSH keys, env vars, AWS credentials, OAuth tokens 15%
B-05 Autonomous action authority Deploy, send emails, modify databases, make purchases without approval 15%
B-06 Deployment/infrastructure access Push to production, modify cloud infrastructure, publish packages 15%

Per-factor scoring (0–4)

Each factor is scored on a 0–4 scale using observable conditions. The per-factor score is then weighted and aggregated into the composite B score (1–10).

# Factor 0 1 2 3 4
B-01 Code Execution None Sandboxed interpreter only Scoped shell (working dir) Full shell, user privileges Shell + root/sudo or demonstrated escalation
B-02 File System None Read-only, scoped Read-write, scoped to project Read-write across home dir Full FS including system dirs, credential stores
B-03 Network No network Blocked-by-default + allowlist Domain-restricted outbound Unrestricted outbound, SSRF protection Unrestricted, no SSRF protection
B-04 Credentials No access Env var filtering blocks sensitive vars Access to some credentials via passthrough Access to API keys, tokens, env vars Access to SSH keys, cloud credentials, browser profiles
B-05 Autonomous Action No autonomous actions Scheduled tasks with mandatory approval Autonomous actions with approval gates Autonomous 24/7 with optional approval Fully autonomous, no approval possible
B-06 Deployment Access None Can suggest deployments Can trigger deploys with approval Direct deploy/publish capability Deploy + infra modification + package publishing

B calculation

raw_b = Σ(factor_score[j] × weight[j]) for j in B-01..B-06
B = max(1, (raw_b / 4.0) × 10)

Where 4.0 is the maximum per-factor score. The result is scaled to 1–10 with a floor of 1, matching the Attack Surface scale.

Factors that appear in both axes (e.g., shell access affects Attack Surface and Blast Radius) are scored for different concerns: the A-axis measures how easily an adversary can reach the capability, while the B-axis measures how much damage the capability enables once reached. This avoids double-counting the same property on both axes.

Interpretation bands for Blast Radius scores are listed in the Score Calibration.

2.3 Defense Controls

Defense Controls · range 0–15. Measures how effectively an agent's controls reduce its raw risk profile. The 5 defense factors map to the attack lifecycle (INPUT → PROCESSING → ACTION → OUTPUT, with DETECTION as the cross-cutting layer), ensuring every defense control falls into exactly one factor.

# Factor Score Scope Lifecycle Stage
D-01 Input Guardrails 0–3 Controls that filter/validate inputs BEFORE the agent processes them INPUT
D-02 Execution Isolation 0–3 Containment constraining WHERE the agent runs and WHAT it can access PROCESSING
D-03 Action Controls 0–3 Gates requiring approval or restricting WHAT ACTIONS the agent can take ACTION
D-04 Output Guardrails 0–3 Controls that validate/filter outputs BEFORE they leave the agent OUTPUT
D-05 Monitoring 0–3 Detection and accountability when controls D-01–D-04 fail DETECTION

Per-factor scoring (0–3)

Each factor is scored on a 0–3 scale using observable controls. The per-factor scores are summed into the composite D score (0–15).

# Factor 0 1 2 3
D-01 Input Guardrails No input filtering; all content reaches agent unvalidated Basic content filtering (profanity, obvious injection patterns) Prompt shield / injection detection + input validation pipeline Multi-layer input validation with instruction hierarchy separation
D-02 Execution Isolation No isolation; agent runs with full user/system privileges App-level isolation (container/VM, but escape demonstrated or no network restriction) Cloud/container isolation with meaningful access scoping and network controls OS-level sandbox (Seatbelt/Landlock/Bubblewrap) + file system scoping + network blocked-by-default or allowlisted
D-03 Action Controls No approval gates; fully autonomous; no permission model Some approval mechanism but easily bypassed or partial coverage Configurable meaningful approval workflows + role-based permissions Granular mandatory permissions with deny-by-default + least privilege enforcement
D-04 Output Guardrails No output filtering; all agent outputs pass through unvalidated Basic output filtering (content safety, format validation) Data loss prevention + exfiltration channel blocking (markdown rendering, image URLs, redirect blocking) Multi-layer output validation + provenance tracking + rendering sanitization
D-05 Monitoring No logging of agent actions; no monitoring; no audit trail Basic logging exists but no active monitoring or alerting Comprehensive logging + active monitoring + incident response capability Full audit trail + behavioral anomaly detection + automated response + compliance certification (SOC 2, FedRAMP, AIUC-1)

D-03 Bypass downgrade: if the approval system has a single-step bypass (YOLO flag, env var, slash command, “always allow” button per invocation), cap D-03 at 1 regardless of how granular the documented permission model is.

D-03 Progressive-allowlisting downgrade: if the approval system has a progressive allowlist that permanently exempts command patterns without expiration or re-authentication (e.g., an “always” / “remember this choice” button that adds the pattern to a non-expiring allowlist), cap D-03 at 1 even if every other gate is well-designed. Lifetime accumulation of bypass surface defeats the purpose of mandatory deny-by-default.

Defense Controls score = D-01 + D-02 + D-03 + D-04 + D-05 (0–15)

Coverage: Every defense control maps to exactly one factor — D-01 (before processing), D-02 (where processing happens), D-03 (what processing can do), D-04 (after processing), D-05 (when D-01–D-04 fail). This aligns with the Wiz/NIST 3-layer guardrails model (Input → Processing → Output) extended with Detection, and maps to all three CoSAI principles (Human-governed & Accountable → Action Controls + Monitoring; Bounded & Resilient → Input Guardrails + Execution Isolation + Output Guardrails; Transparent & Verifiable → Monitoring; see OASIS CoSAI).

Evidence-tiered scoring rule

Defense factors are scored conservatively based on the quality of available evidence, not vendor claims alone. Each factor is capped by the strongest evidence tier available:

Evidence Tier Max Score What Qualifies
Publicly verifiable 3 Published security research testing this control; CVE demonstrating bypass/resilience; inspectable open-source implementation; compliance certification (AIUC-1, FedRAMP) as supporting evidence
Vendor documented 2 Vendor security documentation with technical specifics (architecture docs, trust pages, system cards); user-facing documentation describing the control mechanism
Architecturally inferred 1 No documentation, but control presence/absence can be reasoned from architecture (e.g., read-only agent implies output control; no sandbox code in open-source repo implies D-02=0)
No evidence 0 No documentation, no research, no architectural basis for inferring the control exists

This means a vendor claiming “advanced input filtering” without published injection resistance rates or open-source code cannot score D-01 above 1. An agent whose sandbox implementation is inspectable in a public repository can score D-02=3. An agent not yet in production (e.g., NemoClaw at time of assessment) has its scores capped at the vendor-documented tier regardless of architectural claims.

2.4 Score Calibration

Two anchor points are established to define the scoring range:

Floor anchor: A non-agentic code completion tool with no shell, no file writes, no web access, local-only option, and zero data retention. Most attack surface factors score 0. No published CVEs (evidence penalty +0.0 across all factors). Represents near-minimum risk for an AI-integrated development tool.

Ceiling anchor: A maximally exposed open-source agent with full shell + browser + file system access, dozens of messaging platform integrations, 24/7 daemon operation, thousands of exposed instances, numerous CVEs, and hundreds of known malicious skills. Every attack surface factor scores 3 or 4. Evidence penalties of +2.0 on multiple factors (confirmed CVEs, zero-click exploitation). Zero defense controls across all 5 factors. Represents maximum observed risk in production AI agents.

Evidence penalty validation method: The additive evidence penalty is validated by comparing architecturally similar factors with different evidence levels. A factor with base=4 and a demonstrated CVE (4+1.5=5.0) should score meaningfully higher than a base=4 factor with no published research (4+0.0=4.0). Likewise, a factor with base=2 and no evidence (2+0.0=2.0) should remain unchanged. The additive model ensures that even worst-case architectural scores retain headroom for evidence differentiation.

Lethal Trifecta floor

After the weighted aggregation described under Attack Surface, an additional floor is applied to the Attack Surface score. Any agent that meets all three of the following criteria receives a minimum A of 4.8:

  • Untrusted content (agent ingests content authored by parties other than the operator)
  • Internal access (agent can read private or privileged data — mailbox, files, credentials, customer records)
  • External egress (agent has any default channel to send bytes outside the operator’s trust boundary)

This prevents agents with narrow but critical exposure from being scored too low by the weighted formula. Even if individual factors score in the moderate range, the combination of these three capabilities is sufficient to enable end-to-end exploitation chains, and the score must reflect that reality. The floor of 4.8 is proportionally equivalent to the v2.1 floor of 6.0, rescaled for the v2.2 change from a /4.0 to a /5.0 denominator in the Attack Surface formula (6.0 × 4.0/5.0 = 4.8). Final A = max(weighted_aggregate_A, 4.8) if the trifecta is triggered; otherwise the weighted aggregate is used as-is.

Boundary qualitative review

CoSAI’s three principles (Human-governed & Accountable; Bounded & Resilient; Transparent & Verifiable; see OASIS CoSAI) are used as a qualitative review checklist for borderline agents where the calculated score falls within 0.3 of a quadrant boundary (A within 0.3 of 5.0, or D within 0.3 of 7.0). A reviewer asks: does the agent meaningfully satisfy each principle? The review produces a written note attached to the agent’s record but does not modify the quantitative score. This is to prevent unprincipled score nudging at boundaries.

Recommended score interpretation

The following interpretation bands are used consistently across agent profiles and the Agent Rankings visualization. They are a recommended reading aid, not a mandated classification — operators should weigh the underlying per-factor evidence alongside the band label.

Metric Scale Minimal Risk Low Risk Medium Risk High Risk Critical Risk
AIRQ Score ~1–10 ≥6 5–6 4–5 3–4 <3
Attack Surface 1–10 <1 1–3 3–5 5–7 ≥7
Blast Radius 1–10 <1 1–3 3–5 5–7 ≥7
Defense Controls 0–15 ≥9 7–9 5–7 3–5 <3

For Attack Surface and Blast Radius, higher values indicate greater risk, so the bands progress from Low to Critical as the number rises. For AIRQ Score and Defense Controls, higher values indicate better posture, so the bands are inverted.

3 Framework Alignment

Frameworks inform the AIRQ taxonomy and provide cross-reference views; they do not produce numbers directly.

3.1 Compliance Posture

Certifications and standards compliance are tracked as a separate qualitative field for each agent, not incorporated into the D-01–D-05 Defense Controls score. This separation exists because certifications measure organizational governance processes while D-01–D-05 measure technical controls on the agent itself. An agent can hold AIUC-1 (the company has excellent governance) while lacking OS-level sandboxing (D-02=0). Mixing organizational and technical metrics in one score makes both less meaningful.

Relevant standards for AI agents

Standard Type What It Certifies Relevance to Defense Controls
AIUC-1 AI agent-specific 50+ controls across Security, Safety, Reliability, Data & Privacy, Accountability, Society. Quarterly adversarial testing. Audited by Schellman. Agent-specific. Quarterly testing provides evidence supporting D-01–D-05 scores. Does not directly add points.
ISO/IEC 42001:2023 AI management system Organizational governance for responsible AI. Process-oriented, not technically prescriptive. Governance-level. Confirms vendor has processes to manage AI risk. Does not measure agent-level controls.
SOC 2 Type II Operational security Security, availability, processing integrity, confidentiality, privacy controls over 6–12 months. Annual audit. General-purpose. Relevant to D-05 (audit trail) but does not assess AI-specific defenses.
FedRAMP Federal cloud security Authorization for cloud services used by U.S. federal agencies. Continuous monitoring. Cloud security. Strong evidence for D-02 (isolation) and D-05 (monitoring) in cloud-hosted agents.
ISO 27001 Information security management ISMS governance, risk assessment, control implementation. Foundational. Prerequisite for enterprise trust but does not address AI-specific attack surfaces.

How certifications interact with defense scoring

Certifications serve as evidence for defense factor scores, not as independent score additions. Specifically: AIUC-1 quarterly adversarial testing is evidence supporting D-01 (input guardrails) and D-03 (action controls) scores. FedRAMP continuous monitoring is evidence supporting D-02 (isolation) and D-05 (monitoring) scores. SOC 2 audit trails are evidence supporting D-05 (monitoring & audit) scores. The D-05 level 3 criterion explicitly includes “compliance certification (SOC 2, FedRAMP, AIUC-1)” as a requirement — this is the correct integration point because D-05 specifically measures accountability and audit capability, where certifications are direct evidence.

Certifications are recorded in each agent's profile and displayed in the detail panel but do not appear in the quadrant position, bubble size, or AIRQ Score calculation. This ensures the visualization reflects what the agent technically does, while the profile conveys what the vendor organizationally commits to.

3.2 OWASP Top 10 for Agentic AI (ASI) — Taxonomy Alignment

ASI provides the threat taxonomy our 10 attack surface factors are aligned against. The mapping below shows which AIRQ factors each ASI risk touches — for example, ASI01 (Agent Goal Hijack) spans A-01, A-02, and A-04, so a rater evaluating those factors must consider goal hijacking vectors when assigning the 0–4 base score.

ASI Risk AIRQ Factors What the Rater Considers
ASI01 — Agent Goal Hijack A-01 User Input, A-02 External Data, A-04 Reasoning Module Prompt injection vectors, indirect injection channels, goal manipulation resilience
ASI02 — Tool Misuse & Exploitation A-06 Tool Execution, A-07 Orchestration Tool invocation scope, approval gates, multi-step chaining abuse
ASI03 — Identity and Privilege Abuse A-06 Tool Execution, A-08 Inter-Agent Credential exposure, privilege scoping, inter-agent authentication
ASI04 — Agentic Supply Chain Vulnerabilities A-02 External Data, A-10 Configuration Plugin/MCP integrity, third-party data source trust, dependency provenance
ASI05 — Unexpected Code Execution (RCE) A-01 User Input, A-06 Tool Execution Shell access scope, code execution sandboxing, injection-to-execution chains
ASI06 — Memory and Context Poisoning A-02 External Data, A-03 Memory Systems Memory integrity, cross-session manipulation, context source validation
ASI07 — Insecure Inter-Agent Communication A-07 Orchestration, A-08 Inter-Agent Agent-to-agent trust model, message integrity, delegation scoping
ASI08 — Cascading Failures A-07 Orchestration, A-08 Inter-Agent, A-09 Output Processing Error propagation boundaries, output validation between agents, blast containment
ASI09 — Human–Agent Trust Exploitation A-01 User Input, A-04 Reasoning Module, A-05 Planning Module Approval bypass patterns, reasoning transparency, autonomous decision scope
ASI10 — Rogue Agents A-04 Reasoning Module, A-05 Planning Module, A-07 Orchestration Goal alignment verification, behavioral boundaries, autonomous action authority

ASI does not, by itself, produce a number. The rater assigns the 0–4 score based on the qualitative bands under Attack Surface (minimal / moderate / significant / severe exposure), using the ASI taxonomy to ensure no relevant risk class is overlooked.

3.3 OWASP AIVSS Agentic AI Core Risks — Threat Coverage

The OWASP AI Vulnerability Scoring System (AIVSS) defines 10 Agentic AI Core Security Risks. While AIVSS and AIRQ operate at different levels of analysis — AIVSS scores individual vulnerabilities, AIRQ scores agent posture — the underlying threat catalog maps onto AIRQ’s three-axis model. The table below validates that every AIVSS core risk is assessed through at least one AIRQ attack surface factor, blast radius factor, or defense factor.

OWASP Agentic Core Risk Attack Surface Blast Radius Defense Controls What AIRQ Assesses
Agentic AI Tool Misuse A-06 Tool Execution, A-10 Configuration B-01 Code Execution, B-02 File System, B-03 Network, B-06 Deployment Access D-02 Execution Isolation, D-03 Action Controls Tool invocation scope, sandboxing, approval gates for tool use
Agent Access Control Violation A-06 Tool Execution, A-07 Orchestration B-04 Credentials, B-05 Autonomous Action, B-06 Deployment Access D-02 Execution Isolation, D-03 Action Controls Permission model granularity, credential scoping, least-privilege enforcement
Agent Cascading Failures A-07 Orchestration, A-08 Inter-Agent B-05 Autonomous Action D-01 Input Guardrails, D-04 Output Guardrails Agent-to-agent trust model, output validation between agents, cascade propagation surface
Agent Orchestration Exploitation A-07 Orchestration, A-08 Inter-Agent B-05 Autonomous Action, B-06 Deployment Access D-03 Action Controls, D-05 Monitoring Delegation authority, orchestrator blast scope, anomaly detection across agent chains
Agent Identity Impersonation A-01 User Input, A-08 Inter-Agent B-03 Network, B-04 Credentials D-01 Input Guardrails, D-05 Monitoring Identity verification at input, agent-to-agent authentication, behavioral anomaly detection
Agent Memory & Context Manipulation A-02 External Data, A-03 Memory Systems B-05 Autonomous Action D-01 Input Guardrails, D-04 Output Guardrails Memory integrity, cross-session poisoning, RAG/context source validation
Insecure Critical Systems Interaction A-06 Tool Execution, A-10 Configuration B-01 Code Execution, B-02 File System, B-04 Credentials, B-06 Deployment Access D-02 Execution Isolation, D-03 Action Controls Critical-system integration scope, containment, approval for high-impact actions
Agent Supply Chain Risk A-02 External Data, A-10 Configuration B-01 Code Execution, B-04 Credentials D-02 Execution Isolation, D-05 Monitoring Plugin/MCP integrity, dependency provenance, third-party sandboxing
Agent Untraceability A-07 Orchestration, A-09 Output Processing D-05 Monitoring Audit trail completeness, action attribution, behavioral logging
Agent Goal & Instruction Manipulation A-01 User Input, A-04 Reasoning Module, A-05 Planning Module B-05 Autonomous Action D-01 Input Guardrails, D-03 Action Controls Injection resistance, goal hijack resilience, instruction hierarchy enforcement

Every core risk maps to factors on at least two of the three AIRQ axes. AIVSS does not produce AIRQ scores and AIRQ does not produce AIVSS scores — the frameworks are complementary: AIRQ for agent-level posture, AIVSS for per-vulnerability severity within a deployed agent.

3.4 CSA MAESTRO 7-Layer Architecture — Coverage Validation

MAESTRO's 7 layers validate that our 10 attack surface factors provide complete coverage of the agentic AI stack with no architectural blind spots:

MAESTRO Layer AIRQ Factors Validation
L1 Foundation Models A-04 Reasoning Module, A-05 Planning Module Model adversarial robustness assessed
L2 Data Operations A-02 External Data, A-03 Memory Systems Data ingestion and poisoning assessed
L3 Agent Frameworks A-01 User Input, A-06 Tool Execution Interaction model and tool security assessed
L4 Deployment Infra A-10 Configuration Deployment config and sandboxing assessed
L5 Eval & Observability, L6 Security/Compliance A-09 Output Processing Output validation and monitoring assessed
L7 Agent Ecosystem A-07 Orchestration, A-08 Inter-Agent Multi-agent interactions assessed

After scoring each agent, a MAESTRO coverage check confirms every layer has at least one corresponding factor scored.

3.5 NIST AI Agent Standards Initiative / SP 800-53 COSAiS — Defense Benchmark

NIST SP 800-53 control families (adapted via COSAiS AI overlays) provide the taxonomy for the 5-factor Defense Controls score. The mapping below tells a rater which control family to consult when scoring each defense factor; it does not generate the 0–3 score mechanically.

NIST SP 800-53 Family AIRQ Factor What Is Mapped
System & Information Integrity (SI) D-01 Input Guardrails Input validation and filtering
System & Comms Protection (SC) D-02 Execution Isolation OS-level sandboxing, network restrictions
Access Control (AC) D-03 Action Controls Least-privilege tool access, permission models
Media Protection (MP) + SC D-04 Output Guardrails Data loss prevention, exfiltration blocking
Audit & Accountability (AU) D-05 Monitoring Agent action logging, monitoring
Supply Chain Risk (SR) A-10 Configuration Plugin/MCP integrity verification

The 0–3 scoring for each factor depends on observable controls (sandbox present/absent, network allowlisted/unrestricted, approval gates yes/no) documented in Defense Controls. NIST's role is to ensure the rater evaluates the right category of control, not to compute the score.

3.6 OASIS CoSAI (Coalition for Secure AI) — Practitioner Guidance

CoSAI is a Google-founded, OASIS-hosted industry coalition publishing practitioner guidance on AI security across the lifecycle — AI software supply-chain security, agentic-system governance, AI risk and controls, and red-teaming — with member companies contributing through public workstreams. CoSAI sits alongside OWASP, NIST, and MITRE in the AI-security framework landscape but takes a coalition-driven, vendor-practitioner posture rather than a standards-body or threat-catalog one.

AIRQ borrows CoSAI's framing in two specific places. The supply-chain workstream maps onto the Inter-Agent, Orchestration, and Configuration factors, informing the rater's view of governance and process controls under Monitoring. The agentic-system governance workstream — with its guardrails-and-evaluation patterns — calibrates the expected Execution Isolation and Action Controls evidence tiers for agentic products. CoSAI does not produce numeric scores; AIRQ is the scoring layer, and CoSAI is one of the input frameworks that calibrate what a defender should expect of a vendor.

CoSAI Principle AIRQ Defense Factors Coverage
Human-governed & Accountable D-03 Action Controls, D-05 Monitoring Approval gates, audit trail, accountability
Bounded & Resilient D-01 Input Guardrails, D-02 Execution Isolation, D-04 Output Guardrails Containment, filtering, fail-closed boundaries
Transparent & Verifiable D-05 Monitoring Behavioral observability, compliance certification

CoSAI is not a standard, not a certification, and not an attack catalog. It is consensus practitioner guidance, comparable in altitude to NIST AI RMF informative references rather than to ASI's threat list or ATLAS's adversarial pattern catalog. CoSAI guidance can shape the question set a rater applies during a defense walkthrough, but the AIRQ Score is computed from observable agent properties, not from CoSAI conformance.

3.7 MITRE ATLAS — Evidence Tagging

ATLAS technique IDs tag individual CVEs and security research findings to a standardized adversarial technique vocabulary. The technique ID identifies what kind of attack was demonstrated (e.g., AML.T0051 LLM Prompt Injection, AML.T0040 ML Model Access); the evidence penalty amount (+0.0 / +0.5 / +1.0 / +1.5 / +2.0) is determined by the evidence strength tiers defined under Attack Surface, not by the ATLAS technique itself. Initial Access techniques (prompt injection, data poisoning) typically anchor evidence on A-01 and A-02; Model Access techniques anchor on A-04 and A-05; Exfiltration techniques anchor on A-09 and the B-03 Network factor; Impact techniques anchor on B-01, B-05, and B-06. ATLAS does not produce AIRQ scores — it provides the cross-referencing vocabulary that makes evidence machine-readable for downstream tooling and allows comparison across research published in ATLAS-tagged form.

3.8 Lethal Trifecta — Architectural Risk Gate

The Lethal Trifecta was articulated by Simon Willison in June 2025 as the convergence of three agent capabilities that, when combined, create a structural vulnerability to data exfiltration via prompt injection. The framing has since been widely adopted across industry advisories, security research, and agent governance documentation as a standard term for this architectural risk pattern. AIRQ uses it as a technical term throughout the methodology without repeating the attribution.

AIRQ operationalizes the Lethal Trifecta as a three-dimension binary gate on the Attack Surface score. Each dimension is assessed independently (triggered or not triggered), and when all three are present the Attack Surface floor of A = 4.8 is enforced — ensuring no agent with this structural exposure can score below meaningful attack surface regardless of how individual factor scores aggregate.

Trifecta Dimension Attack Surface Blast Radius Defense Controls Trigger Condition
Untrusted content A-01 User Input, A-02 External Data B-02 File System D-01 Input Guardrails Agent ingests content authored by parties other than the operator
Internal access A-03 Memory Systems, A-06 Tool Execution B-01 Code Execution, B-02 File System, B-04 Credentials D-02 Execution Isolation, D-03 Action Controls Agent can read private or privileged data (mailbox, files, credentials, customer records)
External egress A-09 Output Processing B-03 Network, B-05 Autonomous Action, B-06 Deployment Access D-04 Output Guardrails Agent has any default channel to send bytes outside the operator’s trust boundary

The defense factors in the table — D-01, D-02, D-03, D-04 — are the architectural countermeasures that can structurally remove a leg of the trifecta. D-01 prevents untrusted content from reaching the agent. D-02 scopes what data the agent’s execution environment can access. D-03 enforces least-privilege permissions on which data the agent is authorized to read. D-04 blocks outbound exfiltration channels. The trifecta floor does not credit partial defense — it fires on capability presence, not mitigation quality. Mitigation quality is captured by the D-01–D-05 scores separately.

4 Evaluation Process

4.1 Agentic Pipeline

AIRQ profiles are produced by an automated agentic pipeline rather than by manual analyst worksheets. The pipeline orchestrates a sequence of AI-powered stages, each with defined inputs, outputs, and quality gates:

  • Target research — the pipeline identifies and collects public evidence for each agent, including vendor documentation, vulnerability registries, independent security research, and framework references.
  • Profiling — collected evidence is structured into an agent profile: capabilities, integrations, deployment model, tool access, and observed security properties.
  • Quantification — attack surface, blast radius, and defense control scores are computed from the profiled evidence using the rubrics and formulas described under AIRQ Quantification.
  • Quality assurance — a dual-critic review (architect and analyst perspectives) validates every score against the cited evidence, checks formula reproducibility, and flags discrepancies before publication.

Human operators review flagged outputs and approve the final cohort. The pipeline is reproducible with extremely close base scores given the same evidence corpus and time period, yet formally it cannot be called deterministic.

4.2 Scope Selection

Each agent in the cohort must demonstrate genuine agentic capability (at least two of: autonomous action, tool use, persistent memory, multi-step task decomposition), sufficient public evidence to score the majority of attack surface factors, and meaningful real-world adoption. Agents that are purely chat-based, deprecated, or lack public security evidence are excluded. The cohort is balanced across agent classes to avoid over-representing any single category.

4.3 Data Collection

For each agent, the following was documented from public sources:

  • 10 attack surface scores (0–4 each): Per-factor exposure assessment
  • 6 blast radius scores (0–4 each): Per-factor impact assessment
  • 5 defense factor scores (0–3 each): Input guardrails, execution isolation, action controls, output guardrails, monitoring & audit
  • Tool capabilities: What actions can the agent perform?
  • Human-in-the-loop controls: What requires user approval vs. runs autonomously?
  • Sandboxing/isolation: OS-level sandbox, Docker, network restrictions, file system scoping?
  • Known CVEs and security incidents: Published vulnerabilities, demonstrated attacks, responsible disclosures
  • Vendor security documentation: Trust pages, security whitepapers, architecture docs, system cards

Sources are ranked in a priority hierarchy: vendor-primary documentation and authoritative vulnerability registries (NVD, GHSA) take precedence, followed by independent security labs, academic research, and industry frameworks. When a lower-priority source contradicts a higher-priority source, the higher-priority source prevails and the discrepancy is noted.

Sources: vendor documentation, NIST NVD, GitHub Security Advisories, security research blogs, academic papers (arXiv, OpenReview), security news (The Hacker News, Bleeping Computer, CyberScoop, Dark Reading), and industry frameworks (CSA MAESTRO case studies, CoSAI white papers, NIST CAISI RFI responses). Anonymous community channels (forums, social media, anonymous blogs) are excluded as evidence sources; they may surface leads but are never cited.

4.4 Considerations

Properties of any specific edition's published report — what its snapshot covers, which agents made the cohort, and how its numbers should be read.

  • Point-in-time snapshot. Memory, plugin marketplaces, MCP integrations, and platform substrate features turn over fast enough that a quarterly cadence is the floor for accuracy.
  • Sample selection bias. The cohort is built for class coverage, not exhaustiveness. Within-class breadth is prioritized; an obscure agent within an under-covered class can shift class averages. Class-level conclusions are robust; class-rank conclusions are not.

Properties of how AIRQ quantifies risk — the evidence model, the score formula, and the rater process.

  • Evidence-tier dependency on public research. Score 3 (verified, reproducible) requires public evidence. Closed-source agents with strong but unverifiable controls cannot score above 1 — by design. This penalizes closed implementations of strong defenses and rewards openness to inspection.
  • Defense confidence variance. D-02 (Execution Isolation) and D-05 (Monitoring) have high confidence due to architectural verifiability and published research. D-01 (Input Guardrails) and D-04 (Output Guardrails) have low confidence — scores are frequently inferred or vendor-claimed. The evidence-tiered cap (score 3 requires public verifiability, score 2 vendor documentation, score 1 inference) mitigates but does not eliminate this uncertainty. The overall Defense Controls score is therefore conservative by construction: agents with published security research, inspectable open-source implementations, or documented compliance certifications are rewarded with higher score ceilings, while unverifiable claims are capped. This shifts the burden of proof to the vendor and incentivizes the behaviors that actually improve the security ecosystem.
  • Scoring variance. The AIRQ Score weights are calibrated to current threat data and encode an editorial judgment about which factors matter most. Two raters scoring the same agent in parallel may reach different base scores; inter-rater variance is bounded but real.