Microsoft 365 Copilot Agent Security Risks

Work Copilot Agents microsoft.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (5) ATTACK SURFACE (5.72) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.56
High
Attack Surface
5.72
High
Blast Radius
5.38
High
Defense Controls
5
High
About The Agent

Microsoft 365 Copilot operates as a hosted LLM assistant embedded across the productivity suite, processing email, documents, calendar, and chat through retrieval-augmented generation over organizational data. The agent grounds every response using the authenticated user's full permission scope without per-query approval, connecting to a wide range of plugins and connectors that extend its action surface into third-party services. The runtime executes server-side within tenant-scoped infrastructure managed entirely by the vendor.

About the AI Risk Quadrant

Exposed Giants agents combine a moderate attack surface with moderate blast radius, where the documented default configuration exposes organizational data through confirmed exploitation chains without requiring elevated attacker sophistication. Microsoft 365 Copilot exemplifies this quadrant: demonstrated zero-click prompt injection reaches internal files and credentials through auto-fetch side channels, while partial input guardrails reduce but do not eliminate the exploitation surface that independent researchers have repeatedly validated.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Confirmed zero-click prompt injection chains expose the full organizational data estate through credential-equivalent access, with output guardrails and monitoring absent at the default configuration.

Key Input Risks
The agent ingests email bodies, shared documents, calendar items, and web content through retrieval-augmented generation grounding over organizational data on its default configuration. A confirmed zero-click indirect prompt injection [1] demonstrated that attacker-controlled email alone triggers processing without any user interaction.
Key Execution Risks
The agent executes server-side within hosted infrastructure without an operator-accessible sandbox or configurable isolation boundary. No public red-team assessment of the execution tier has been published, and no third-party audit of tenant-scoped compute isolation is independently verifiable.
Key Action Risks
The agent holds delegated OAuth scopes matching the authenticated user's full permission set, enabling read and write access to email, files, and calendar without per-action operator approval. The highest-blast-radius scope grants credential-equivalent access across the entire organizational data estate [1].
Key Output Risks
The agent renders markdown including auto-fetched image URLs and can invoke plugins that make outbound HTTP requests on the default configuration. A confirmed exfiltration chain [1] transmitted internal file content to an attacker-controlled server via a proxy domain bypassing link redaction.
Key Monitoring Risks
The agent logs interactions to Purview Unified Audit Log, but anomaly detection and SIEM forwarding for injection attempts require manual configuration in the admin center. The operator's blind spot is real-time detection of indirect injection executing within a legitimate user session.

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Confirmed exploitation chains against partial defenses produce a moderate composite score anchored by credential-equivalent blast radius and bypassable input filtering.

AIRQ Metrics

Operators deploying this agent inherit confirmed data exfiltration paths that reach the full organizational data estate despite partial input filtering, placing the risk profile where attack capability outpaces defense maturity.

The four metrics below capture the gap between the agent's broad data access and its incomplete default-on protection against confirmed attack patterns.

Metric Score Comments
AIRQ Score 4.56 Moderate composite risk driven by confirmed exploitation against partially defended credential-equivalent access.
Blast Radius 5.38 / 10 File system, network, and credential factors at maximum reflect full organizational data inheritance.
Attack Surface 5.72 / 10 Three surfaces at adjusted ceiling from confirmed zero-click injection and data exfiltration vulnerabilities.
Defense Controls 5 / 15 Input filtering and execution isolation partially present; output guardrails confirmed bypassable against targeted evasion.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. Three attack surfaces reach adjusted ceiling driven by confirmed zero-click prompt injection and auto-fetch exfiltration chains that bypassed deployed input and output filters.

Attack Surface Metrics

Scores reflect base architectural exposure plus evidence penalties from confirmed agent-specific vulnerabilities and demonstrated exploitation research.

Each row measures how attacker-controlled input reaches the reasoning loop through that channel, with penalties applied where confirmed exploitation exists.

Surface Score Comments
User Input 5 / 4 Direct prompt channel accepts arbitrary text; indirect injection via crafted email demonstrated zero-click bypass of cross-prompt injection filters [1]. A separate validation flaw confirmed unauthorized network-based information disclosure without authentication [2]. Both vulnerabilities received critical severity ratings and were patched server-side.
External Data 5 / 4 Retrieval-augmented generation ingests email, documents, and web content from organizational storage without per-item validation. Confirmed exploitation demonstrated that attacker-controlled email content triggers automatic tool invocation [6] and data staging without user interaction [1] [5]. The grounding pipeline processes external content identically to trusted internal documents.
Memory 3 / 4 Declarative memory stores durable facts across sessions. Confirmed exploitation demonstrated that prompt injection can add and delete memory entries [3] [7], enabling persistent influence over future responses. The attack chain showed how manipulated memory enables long-term data exfiltration through CSS-based side channels.
Reasoning 2 / 4 The reasoning layer operates on mixed trusted and untrusted context without documented separation between system instructions and grounded content [9]. No public red-team of reasoning-layer isolation has been published. The base exposure reflects standard LLM context-mixing risk without confirmed agent-specific exploitation.
Planning 1 / 4 Single-turn interaction model without multi-step autonomous planning in the default configuration [9]. The agent does not decompose tasks into sub-goals or execute multi-step chains without explicit user direction, limiting the planning surface to direct request-response cycles.
Tool Execution 1 / 4 Tools are limited to organizational API calls scoped to existing user permissions [9]. No shell, browser, or code execution sandbox is exposed in the default configuration. The constrained tool surface restricts exploitation to data access rather than arbitrary code execution.
Orchestration 2 / 4 Internal orchestration coordinates retrieval, grounding, and response generation across organizational API endpoints [10]. The orchestration layer is not operator-configurable and runs within vendor-hosted infrastructure, limiting attacker manipulation to indirect influence through grounded content.
Inter-Agent 2 / 4 The agent connects to plugins and third-party connectors that extend its action surface [15]. Plugin interactions inherit the user's authentication context without additional per-plugin consent gates. Independent research demonstrated that these connections enable remote execution chains triggered by email [8].
Output Processing 5 / 4 Markdown rendering with auto-fetched image URLs enables side-channel exfiltration [16]. A confirmed exploit [1] bypassed link redaction through Unicode manipulation and proxy domain routing. A separate vulnerability [4] confirmed information disclosure through improper neutralization of output consumed by downstream components.
Configuration 2 / 4 Tenant administrators control availability and data access policies through a centralized admin center [9]. Individual user-level security configuration is limited; default permissions inherit the user's full organizational API scope without per-resource restriction.

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. This agent processes attacker-controlled email and document content, accesses the authenticated user's full organizational data estate including files and credentials, and transmits bytes externally through auto-fetched URLs and plugin HTTP calls on its default configuration.

Lethal Trifecta · Complete (3 of 3)

Microsoft 365 Copilot exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Untrusted input confirmed: attacker-controlled email content triggers processing without user interaction, as demonstrated by the zero-click indirect prompt injection chain that bypassed deployed cross-prompt injection filters [1].
  • Sensitive data — Sensitive data confirmed: the agent accesses the full organizational data estate through the authenticated user's permission scope, including email, files, calendar, and credentials, without per-resource classification gates on grounding [9].
  • External egress — External egress confirmed: auto-fetched markdown image URLs and plugin HTTP calls transmit bytes outside the operator's trust boundary, as demonstrated by data exfiltration to an attacker-controlled server through a proxy domain [1].

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. File system, network, and credential access factors at maximum reflect that compromise of the agent's reasoning grants access equivalent to the authenticated user's full organizational permissions.

Blast Radius Metrics

Scores reflect the damage an attacker achieves after successfully injecting into the agent's context, measured by the resources reachable from that position.

Each factor measures the scope of organizational resources an attacker can reach or exfiltrate through a successfully compromised agent session.

Factor Score Comments
Code execution 1 / 4 No code execution sandbox exposed in the default configuration [9]. The agent generates text responses and invokes organizational API calls but does not execute arbitrary code, limiting post-compromise actions to data access and messaging.
File system access 3 / 4 Full read access to organizational file storage within the user's permission scope. Confirmed exploitation [1] demonstrated exfiltration of internal document content through the auto-fetch side channel without additional authentication or user approval.
Network access 3 / 4 Outbound network access through auto-fetched image URLs and plugin HTTP calls [14]. Confirmed exploitation [1] demonstrated data transmission to an external server through a proxy domain that bypassed content security policy restrictions on the default configuration.
Credential access 3 / 4 The agent operates with the authenticated user's full OAuth token granting access equivalent to the user's own credential across email, files, calendar, and messaging [13]. Confirmed exploitation [1] accessed content through these delegated scopes without re-authentication.
Autonomous action 2 / 4 The agent can draft messages, create events, and modify files within the user's scope [6]. Actions that send outbound communications require user confirmation in the default configuration, though indirect prompt injection via malicious email demonstrated automatic tool invocation bypassing this gate.
Deployment access 1 / 4 No deployment, infrastructure management, or CI/CD capabilities in the default configuration [9]. The agent operates within the productivity boundary without access to resource management or code deployment pipelines.

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Input filtering and execution isolation are partially present but confirmed bypassable, while output guardrails and monitoring have nothing effective in place against demonstrated attack vectors.

Defense Controls Metrics

Scores reflect what the default configuration provides without operator customization, using an inverted scale where higher values indicate stronger protection.

Each component measures whether a specific defensive control operates by default and whether confirmed exploitation has demonstrated its bypass.

Component Score Comments
Input Guardrails 2 / 3 Prompt Shields and cross-prompt injection classifiers are deployed as default-on input filters [10]. However, confirmed exploitation [1] demonstrated complete bypass through Unicode manipulation, indicating the control exists but fails against targeted evasion by a motivated attacker with knowledge of the filter architecture.
Execution Isolation 2 / 3 Server-side hosted execution with tenant-scoped compute isolation is documented in vendor architecture materials [9] [11]. No public third-party audit of the isolation boundary exists, and the operator cannot configure sandbox parameters or inspect execution logs for the grounding pipeline.
Action Controls 1 / 3 Send-action confirmation gates exist for outbound email in the default configuration. Independent research [6] demonstrated tool invocation and data staging without user approval through indirect prompt injection, indicating the control is partially bypassable.
Output Guardrails 0 / 3 Link redaction is intended to prevent data exfiltration via URLs in responses. The EchoLeak chain [1] defeated this control entirely through Unicode obfuscation and proxy domain routing, proving no effective default-on output sanitization exists against targeted evasion vectors.
Monitoring 0 / 3 Purview Unified Audit Log captures interactions for compliance review. No documented default-on anomaly detection for prompt injection patterns or data exfiltration attempts within sessions exists [12]. Operators must configure SIEM integration and alerting rules manually.

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators can materially reduce the confirmed exploitation surface by restricting data access scope, enforcing output controls, and enabling detection capabilities that are available but not active by default.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Restrict agent access to sensitivity-labeled content using information protection policies, preventing grounding on documents above a configured classification tier — counters the unrestricted retrieval-augmented generation that enabled organizational data exfiltration.
  • Configuration Deploy data loss prevention policies scoped to the agent location to block responses grounded on content matching sensitive information types defined by the organization — counters the absence of per-item classification gates on ingested content.
  • Engineering Implement conditional access policies requiring compliant devices and named locations for agent access, reducing the attack surface for credential-based injection delivery — counters the zero-click email delivery vector demonstrated in confirmed exploitation.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Scope agent licensing to security groups with minimum-necessary organizational API permissions rather than deploying organization-wide — counters the full permission inheritance that grants credential-equivalent access to every licensed user.
  • Configuration Enable audit log forwarding to a SIEM with alerts on anomalous interaction patterns including high-volume file access and unusual time-of-day usage — counters the absence of default-on execution monitoring for injection indicators.
  • Engineering Review and restrict third-party plugin deployments through integrated apps management controls — counters the extended action surface from unvetted connector connections that inherit user authentication context.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Disable agent email-send capabilities for users who do not require AI-assisted outbound communication through transport rule enforcement — counters the demonstrated tool invocation bypass that stages exfiltration via message drafts.
  • Configuration Restrict agent file-write permissions using site-level access policies limiting which document libraries the agent can modify — counters the autonomous file modification capability within the user's full organizational scope.
  • Engineering Deploy cloud application session policies that block agent-initiated downloads of documents matching data loss prevention patterns — counters the file system blast radius from credential-equivalent organizational API access.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Configure content safety policies to restrict agent rendering of external URLs and image content in responses — counters the auto-fetch exfiltration channel exploited through the confirmed zero-click attack chain.
  • Configuration Deploy network-layer URL filtering blocking outbound requests to non-allowlisted domains from service endpoints — counters the proxy domain exfiltration path that bypassed content security policy restrictions.
  • Engineering Enable communication compliance policies monitoring agent output for patterns matching known exfiltration encodings including Unicode smuggling and encoded data in URLs — counters the ASCII smuggling staging technique demonstrated by independent researchers.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Forward unified audit log agent events to a SIEM with correlation rules detecting injection indicators including repeated grounding failures and high-frequency file access within single sessions — counters the absent default anomaly detection.
  • Configuration Deploy analytics rules specifically targeting agent interaction anomalies using dedicated telemetry tables — counters the operator blind spot on real-time injection detection within legitimate user sessions.
  • Engineering Enable insider risk management signals for agent sessions exhibiting data exfiltration patterns including sequential access to classified documents followed by outbound communication — counters the gap between compliance logging and active threat detection.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2025-32711 EchoLeak zero-click prompt injection in M365 Copilot CVSS 9.3. Crafted email triggered data exfiltration via auto-fetched image URLs bypassing XPIA filters. Patched server-side May 2025.
  2. CVE-2026-24307 Improper input validation in M365 Copilot CVSS 9.3. Unauthorized attacker could disclose information over network without authentication. Patched server-side January 2026.
  3. CVE-2026-24299 Command injection in M365 Copilot CVSS 5.3. Demonstrated at DEF CON as Copirate 365 including memory manipulation and persistent data exfiltration. Patched March 2026.
  4. CVE-2026-26129 Critical information disclosure in M365 Copilot Business Chat via improper neutralization of output used by downstream component CVSS 7.5. Patched May 2026.

Selected Research

  1. EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit Aim Security research demonstrating the full EchoLeak attack chain against M365 Copilot including XPIA bypass and auto-fetch exfiltration.
  2. Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information Rehberger demonstrated prompt injection via malicious email triggering automatic tool invocation and ASCII smuggling for data exfiltration.
  3. Copirate 365 at DEF CON DEF CON Singapore presentation demonstrating memory manipulation and CSS-based data exfiltration and persistent theft chains in M365 Copilot.
  4. Living off Microsoft Copilot Bargury demonstrated remote Copilot execution via email-triggered search and exfiltration and DLP bypass using LOLCopilot offensive toolset at Black Hat USA 2024.

Vendor Documentation

  1. Security for Microsoft 365 Copilot Primary vendor security documentation describing identity controls and tenant isolation and data governance and Entra ID integration architecture.
  2. How Microsoft defends against indirect prompt injection attacks MSRC blog describing Prompt Shields and TaskTracker internal-state analysis and FIDES information-flow control and the LLMail-Inject challenge.
  3. Data Privacy and Security for Microsoft 365 Copilot Documents permissions model and prompt injection blocking and encryption at rest and in transit and compliance commitments.
  4. Microsoft Purview DLP for Microsoft 365 Copilot Documents how DLP policies can block Copilot from responding or grounding when prompts contain sensitive information types.

Other Sources

  1. Critical Microsoft 365 Copilot Vulnerabilities Expose Sensitive Information Security news coverage of May 2026 triple-CVE disclosure documenting critical information disclosure vulnerabilities patched server-side.
  2. EchoLeak AI Attack Enabled Theft of Sensitive Data via Microsoft 365 Copilot SecurityWeek coverage providing operator-facing analysis of CVE-2025-32711 disclosure including attack mechanics and scoped data access.
  3. How to Weaponize Microsoft Copilot for Cyberattackers Dark Reading coverage of Black Hat USA 2024 LOLCopilot demonstrations including remote Copilot execution and post-compromise spear-phishing.
  4. EchoLeak: A Reminder That AI Agent Risks Are Here to Stay Zenity Labs analysis connecting EchoLeak to broader M365 Copilot attack surface and explaining why underlying injection issues persist.