1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. The dominant risk shape combines demonstrated indirect prompt injection and SSRF exploitation against broad connector authority with defense controls that detected but did not block data exfiltration.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Demonstrated exploitation across input, tool execution, and output channels drives the attack surface score, while partial defense controls that detected but failed to block attacks limit risk reduction.
The high attack surface and moderate blast radius combined with defense controls that were bypassed in multiple independent exercises places this agent in the Fortified Leaders quadrant.
Each score reflects the documented default configuration, with evidence drawn from agent-specific vulnerabilities, independent security research, and vendor documentation.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 6.79 | Composite risk driven by demonstrated exploitation of multiple attack surfaces, moderate blast radius through internal infrastructure access, and partially effective defense controls that detected threats without blocking them. |
| Blast Radius | 6.25 / 10 | Blast factors anchored on confirmed SSRF reaching internal services and managed identity token compromise, with autonomous agents executing privileged actions absent operator approval on the default posture. |
| Attack Surface | 7.52 / 10 | Evidence base includes agent-specific NVD entries covering SSRF, prompt injection, and XSS, independent red-team exercises demonstrating data exfiltration, and a MITRE ATLAS case study mapping autonomous agent exploitation techniques. |
| Defense Controls | 8 / 15 | Built-in prompt shields and connector policies exist but were bypassed in independently demonstrated exercises, with runtime monitoring requiring third-party webhook integration not active by default. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. Demonstrated indirect prompt injection, SSRF via HTTP connectors, and data exfiltration through email output make input ingestion, tool execution, and output processing the dominant exposure channels.
Evidence penalties reflect agent-specific CVEs and independently demonstrated exploitation; surfaces without penalties are scored on documented capabilities and architectural exposure.
Each surface is scored on its default exposure and adjusted upward when agent-specific vulnerabilities or independently demonstrated attacks confirm exploitation of the documented configuration.
| Surface | Score | Comments |
|---|---|---|
| User Input | 5 / 4 | Agents accept input across web chat, Teams channels, email, SharePoint form inputs, and the Direct Line API without mandatory authentication. CVE-2026-21520 demonstrated indirect prompt injection via a SharePoint form field that overrode system instructions and exfiltrated data through a connected output channel. The built-in prompt shield flagged the anomaly but did not block the resulting action. Independent research demonstrated related injection techniques via ASCII smuggling in the broader plugin architecture shared with this platform. [4][6][17] |
| External Data | 5 / 4 | SharePoint Lists, Dataverse tables, uploaded documents, and knowledge sources feed agent context as external data on the default configuration. CVE-2026-21520 demonstrated that form field content was concatenated with system instructions without sanitization, allowing attacker-controlled external data to hijack agent behavior and trigger exfiltration via connected tools. The knowledge source ingestion path treats all configured data as trusted. [4][6] |
| Memory | 1 / 4 | Session context persists within a conversation but does not survive across sessions by default. No cross-session memory poisoning surface is documented, and knowledge sources are operator-managed SharePoint sites and Dataverse tables rather than agent-writable memory stores. The absence of persistent cross-session state limits the memory surface to within-session context manipulation only. [9] |
| Reasoning | 2 / 4 | Generative orchestration routes user intents to topics and actions using the underlying language model. The reasoning layer is exposed to indirect prompt injection payloads that reached it through external data channels in demonstrated attacks. Vendor documentation describes a defense-in-depth stack including prompt shields and deterministic exfiltration blocking at the reasoning boundary, though the shields detected but did not prevent goal hijack in the demonstrated exercise. [18] |
| Planning | 3 / 4 | Autonomous agents use generative orchestration to plan multi-step action sequences triggered by events, schedules, or inbound messages without operator intervention. A case study documented an autonomous agent planning and executing a multi-step data exfiltration sequence without human approval, demonstrating that the planning layer can be steered by injected goals to sequence privileged connector actions. Vendor documentation confirms event-triggered and scheduled planning as default capabilities for autonomous agent configurations. [8][19] |
| Tool Execution | 5 / 4 | The agent invokes Power Platform connectors, HTTP request actions, Code Interpreter sessions, and MCP tool calls under delegated OAuth credentials. CVE-2024-38206 (CVSS 8.5) demonstrated SSRF via HttpRequestAction that bypassed protection through HTTP redirect chains, reaching internal infrastructure and enabling managed identity token theft with cross-tenant data store access. The tool execution boundary was insufficient to constrain redirect-chain attacks against the HTTP connector surface. [1][5] |
| Orchestration | 3 / 4 | Generative orchestration sequences topic routing, connector invocations, and child agent delegation in a single session context. Vendor security research acknowledges orchestration-layer threats where agents plan and sequence privileged actions autonomously, and the external threat detection webhook exists specifically to intercept unsafe orchestration decisions at runtime. The webhook is opt-in, leaving deployments without the integration exposed to orchestration-layer manipulation. [12][16] |
| Inter-Agent | 3 / 4 | The platform supports child agents, connected agents, and MCP-based agent-to-agent communication on the default configuration. The external threat detection interface documents built-in prompt shields for both direct and indirect prompt injection in inter-agent tool invocations, confirming the vendor treats this surface as a security boundary requiring active defense. The shields operate at the orchestration boundary between parent and child agents. [12] |
| Output Processing | 5 / 4 | Agent responses reach end users and external systems through web chat, Teams, email connectors, and data write-backs. CVE-2026-21520 showed that data was exfiltrated through the email connector even though the safety layer detected the anomaly, and a separate red-team exercise extracted full CRM records through the identical output path via inbox-based prompt injection. Connector-level DLP enforcement is available but must be explicitly activated by administrators. [4][7] |
| Configuration | 4 / 4 | CVE-2024-49038 (CVSS 9.3) demonstrated cross-site scripting in the authoring interface that enabled unauthenticated privilege escalation via improper input neutralization. CVE-2024-43610 (CVSS 7.4) disclosed information exposure through a network attack vector against the platform. The web-based authoring surface was vulnerable to input neutralization failures that directly compromised agent configuration integrity for any agent built on the affected instance. [2][3] |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. This agent processes untrusted SharePoint form and email content, accesses private Dataverse and SharePoint data through delegated OAuth scopes, and sends bytes externally via email and connector write-backs — all three conditions are confirmed by agent-specific CVEs and independent red-team exercises.
Microsoft Copilot Studio exhibits all three of these conditions in its documented default configuration:
- Untrusted input — Agents ingest untrusted content from web chat, email inboxes, and SharePoint forms; CVE-2026-21520 confirmed that form field payloads reached the reasoning loop and overrode system instructions. [4][6]
- Sensitive data — Agents read private SharePoint Lists, Dataverse records, and mailbox content through delegated OAuth scopes; CVE-2024-38206 demonstrated that managed identity tokens granted access to internal data store instances. [1]
- External egress — Agents send bytes externally via Outlook email, Teams messages, and connector write-backs; CVE-2026-21520 and an independent exercise both demonstrated data exfiltration through the email output channel. [4][7]
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Demonstrated network traversal to internal infrastructure and credential theft via managed identity tokens anchor the highest blast factors, while autonomous action fires without operator approval gates.
Factors scored at the ceiling carry agent-specific evidence of exploitation; lower factors reflect documented capability boundaries without demonstrated escalation.
Each factor reflects the maximum confirmed damage an attacker could achieve through the agent given demonstrated exploits and documented default capabilities.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 2 / 4 | Code Interpreter runs in ephemeral sandboxed VMs with no inbound or outbound network access, session-scoped with no persistence, and real-time malicious pattern scanning. Connector-invoked actions execute within Power Platform runtime boundaries but do not provide arbitrary code execution beyond their defined scope. The sandbox isolation is vendor-documented and independently verifiable. [14] |
| File system access | 2 / 4 | Agents access SharePoint document libraries, Dataverse file attachments, and OneDrive files through configured connectors with delegated OAuth scopes. File system access is scoped to what the connector credentials authorize rather than raw host filesystem traversal, limiting blast to the data the operator chose to expose through connector configuration. [9] |
| Network access | 3 / 4 | CVE-2024-38206 demonstrated that SSRF via HttpRequestAction could reach Azure IMDS, internal subnets, and data store instances from within the agent network boundary. The redirect-chain bypass proved the agent runtime had network visibility into shared-tenant infrastructure that protection mechanisms failed to constrain. [1][5] |
| Credential access | 3 / 4 | CVE-2024-38206 demonstrated managed identity token theft via IMDS, granting read/write access to internal data stores including cross-tenant data access before the vulnerability was patched server-side. The stolen token carried the agent runtime managed identity scope, proving that credential exposure extended beyond the intended authorization boundary. [1] |
| Autonomous action | 3 / 4 | Event-triggered and scheduled agents execute privileged connector actions without per-invocation operator approval on the default posture. A case study documented an inbox-triggered agent autonomously exfiltrating data through connected tools absent any human-in-the-loop gate. Connector-level DLP policies can constrain autonomous actions but are not active by default. [8][19] |
| Deployment access | 2 / 4 | Agents are published to channels by makers through the web-based authoring interface. CVE-2024-49038 demonstrated that XSS in the authoring surface could escalate privileges, but deployment access is scoped to the platform environment and does not extend to underlying cloud infrastructure. [3] |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Built-in prompt shields and connector-level DLP policies exist but were bypassed in demonstrated exercises, and runtime behavioral detection requires third-party webhook integration not active by default.
Higher scores indicate stronger default controls; confidence markers reflect whether the control was verified through independent testing or documented by the vendor only.
Each control is scored on what runs by default without operator configuration, with confidence reflecting the strength of the verification evidence.
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 1 / 3 | Built-in prompt shields detect both direct and indirect prompt injection at runtime, with the external threat detection webhook providing a secondary enforcement layer. However, CVE-2026-21520 demonstrated that the safety mechanism flagged an indirect injection payload yet did not block the resulting exfiltration action. Vendor documentation describes deterministic exfiltration blocking as part of the defense-in-depth stack, but the demonstrated bypass places default input guardrails at the detection-without-enforcement tier. [4][6][12][18] |
| Execution Isolation | 2 / 3 | The code execution sandbox uses ephemeral VMs isolated from all network traffic, vendor-documented as a security boundary with runtime malware scanning. However, HTTP connectors lack equivalent containment — CVE-2024-38206 demonstrated SSRF that escaped the agent runtime boundary through redirect chains. The sandboxed code path reduces blast while the unsandboxed HTTP path remains exposed to network-traversal attacks. [1][14] |
| Action Controls | 2 / 3 | Connector-level DLP policies allow administrators to block or allow specific agent capabilities, and authentication configuration supports requiring sign-in for web chat agents. However, DLP policies are not enforced by default and must be explicitly configured by platform administrators, with the default enforcement state depending on tenant-level configuration choices. [10][15] |
| Output Guardrails | 1 / 3 | The platform includes output safety mechanisms that detected the indirect prompt injection exfiltration attempt in the demonstrated ShareLeak exercise, but the safety layer flagged the anomaly without blocking the email send action. Connector-level DLP policies can restrict outbound channels but are opt-in rather than enforced by default. Output controls exist but do not prevent data leaving the trust boundary on the default posture. [4][7][10] |
| Monitoring | 2 / 3 | Platform audit logs record administrative, authoring, and end-user activities, with conversation records persisted in the backend data store and covering the compliance posture documented through industry certifications including SOC, ISO 27001, and FedRAMP. Real-time behavioral alerting for autonomous agent behavior depends on the external threat detection webhook with third-party integration, which is documented but not enabled by default. [11][13][16] |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators can reduce the demonstrated attack surface by requiring authentication, enforcing connector-level DLP policies, and integrating runtime threat detection before publishing agents to production channels.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Require Entra ID sign-in for all web chat agents to eliminate anonymous input channels — counters the unauthenticated input path exploited in CVE-2026-21520.
- Configuration Enable the external threat detection webhook with a security provider to intercept indirect prompt injection payloads at runtime — counters the detection-without-blocking gap.
- Engineering Deploy a custom content safety classifier tuned to the agent knowledge domain to catch domain-specific injection patterns the built-in shields miss.
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Mandate that all custom connectors use only allowlisted destination hosts to prevent redirect-chain SSRF — counters the bypass demonstrated in CVE-2024-38206.
- Configuration Disable HttpRequestAction for agents that do not require outbound HTTP, restricting tool execution to verified connectors — counters open HTTP surfaces.
- Engineering Implement network security groups on the platform environment to restrict outbound traffic from agent runtimes to known-safe endpoints only.
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Enforce connector-level DLP policies in the admin center to block sensitive connectors from agent use by default — counters the opt-in enforcement gap.
- Configuration Configure human-in-the-loop approval flows for high-privilege actions in autonomous agent topics — counters approval-free autonomous execution.
- Engineering Implement approval steps within agent flows to require explicit operator sign-off before executing connector actions that modify external state.
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Restrict the email connector to internal recipients only for agents processing untrusted input — counters the external exfiltration channel demonstrated in CVE-2026-21520.
- Configuration Configure data loss prevention policies to scan and block sensitive data patterns in outbound email and messages sent by agents.
- Engineering Deploy a custom output filter via the threat detection webhook that inspects agent payloads for data patterns matching the configured knowledge sources.
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Mandate that all production agents have runtime monitoring configured via the threat detection webhook before publishing to any channel.
- Configuration Forward audit logs to the organization SIEM with alerting rules for anomalous agent behaviors such as bulk data reads followed by email sends.
- Engineering Implement conversation transcript review workflows using stored transcripts to detect prompt injection attempts and unexpected tool invocations.
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- CVE-2024-38206 SSRF protection bypass in Copilot Studio (CVSS 8.5). An authenticated attacker exploits HTTP redirect chains to reach Azure IMDS and internal Cosmos DB instances, gaining read/write access to shared-tenant infrastructure. Patched server-side by Microsoft.
- CVE-2024-43610 Information disclosure in the Copilot Studio web application (CVSS 7.4). An unauthenticated attacker exploits a network-accessible vector to view sensitive platform configuration or tenant metadata. Patched server-side.
- CVE-2024-49038 Cross-site scripting in Copilot Studio leading to privilege escalation (CVSS 9.3). An unauthenticated attacker escalates privileges via improper input neutralization during web page generation. Patched server-side.
- CVE-2026-21520 ShareLeak indirect prompt injection in Copilot Studio (CVSS 7.5). An unauthenticated attacker injects a payload via a SharePoint form field that overrides agent system instructions and exfiltrates SharePoint List data via Outlook email. Patched January 2026.
Selected Research
- Tenable Research TRA-2024-32 Tenable demonstrates the SSRF chain exploiting Copilot Studio HttpRequestAction: redirect to IMDS, managed identity token theft, and Cosmos DB read/write with cross-tenant implications.
- ShareLeak — Capsule Security Capsule Security discovers and demonstrates indirect prompt injection via SharePoint form input, showing that safety mechanisms flagged the request but DLP did not block the exfiltration.
- AIjacking data exfiltration — Zenity Labs Zenity Labs demonstrates full CRM data exfiltration from a Copilot Studio customer service agent via prompt injection through an email inbox, enumerating knowledge sources and invoking the email tool.
- MITRE ATLAS AML.CS0037 MITRE ATLAS case study documenting the Zenity exercise against a Copilot Studio agent, mapping to AML.T0086 (Exfiltration via AI Agent Tool Invocation) and AML.T0065 (LLM Prompt Crafting).
Vendor Documentation
- Copilot Studio security and governance Microsoft documents Copilot Studio DLP enforcement, SDL practices, geographic data residency, and governance controls available through the Power Platform admin center.
- Data loss prevention policy configuration Microsoft documents connector-level data policies that admins configure to allow or block specific agent capabilities, with enforcement active since early 2025.
- Compliance certifications Microsoft documents SOC, ISO 27001, HIPAA, FedRAMP, PCI DSS, and CSA STAR compliance coverage for Copilot Studio as a Power Platform Online Service.
- External threat detection for agents Microsoft documents built-in UPIA/XPIA prompt shields and the webhook-based external threat detection interface that third-party providers or Defender can use to block tool invocations at runtime.
- Audit logging via Microsoft Purview Microsoft documents Purview audit logging for admin, maker, and user activities, with conversation transcripts stored in Dataverse and accessible via the Office 365 Management API.
- Code Interpreter security architecture Microsoft documents the Code Interpreter sandbox: ephemeral Azure VMs with full network isolation, scoped to a single session with no data persistence, and active malware pattern detection.
- Authentication configuration Microsoft documents authentication options including Entra ID, generic OAuth 2, federated identity, and the require-sign-in toggle that controls unauthenticated access to web chat agents.
Other Sources
- Runtime defense for AI agents — Microsoft Security Blog Microsoft Defender team examines three adversarial scenarios targeting Copilot Studio agent orchestration and demonstrates webhook-based runtime checks that detect and block unsafe tool invocations.
- M365 Copilot prompt injection via ASCII smuggling — Embrace The Red Johann Rehberger demonstrates prompt injection via malicious email, ASCII smuggling for invisible data staging, and exfiltration via clickable hyperlinks in the broader Copilot plugin architecture shared with Copilot Studio.
- Defending against indirect prompt injection — MSRC Microsoft Security Response Center describes the defense-in-depth stack against indirect prompt injection: Prompt Shields via Azure AI Content Safety, deterministic exfiltration blocking, and human-in-the-loop approval patterns.
- Autonomous agents design guidance Microsoft documents autonomous agent capabilities in Copilot Studio: event triggers, scheduled triggers, decision boundaries, and security guardrails for agents that act without user prompts.