Claude Cowork Agent Security Risks

Computer Agents claude.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (2) ATTACK SURFACE (5.64) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.19
High
Attack Surface
5.64
High
Blast Radius
6.75
High
Defense Controls
2
Critical
About The Agent

Claude Cowork is an autonomous desktop knowledge-work agent from Anthropic that executes multi-step tasks on the operator's machine using shell access, computer use capabilities, and OAuth-scoped connectors to services including Slack, Google Workspace, and Microsoft 365. The default configuration runs commands and browser automation directly on the host operating system, relying on per-app permission prompts and trained model resistance as the primary execution boundary rather than container or VM isolation.

About the AI Risk Quadrant

Exposed Giants are agents whose capability exposure outpaces their documented defense controls — the attack surface is broad enough to concern an operator, but the blast radius has not yet reached the level where compromise equates to full infrastructure takeover. Claude Cowork places here because its ten input channels and unsandboxed desktop execution create consistent attack surface exposure, while the absence of deployment pipeline access keeps blast radius below the top tier.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Claude Cowork combines unsandboxed desktop execution with full connector OAuth scope and no output controls, creating consistent exposure across all five defense components on its default configuration.

Key Input Risks
Claude Cowork ingests content from browser pages during computer use, MCP server outputs, connector data from Slack and Google Workspace, and local files in operator-approved folders. Vendor safety documentation identifies web content as the primary prompt injection vector, with classifiers deployed only on computer-use screenshot inputs [7].
Key Execution Risks
Computer use executes shell commands, browser automation, and application control directly on the operator's desktop without any sandbox, container, or VM isolation boundary. Independent research demonstrated destructive command execution via indirect prompt injection through an obfuscated PDF payload processed during a computer use session [4].
Key Action Risks
Scheduled tasks, mobile-dispatched operations, and connector write actions fire autonomously without per-action operator approval once initial OAuth grants or folder approvals are given. Connector permissions inherit the operator's full scope in connected services including message sending and document editing [9].
Key Output Risks
Cowork emits files, connector messages, browser form submissions, and web search requests with no documented DLP or exfiltration blocking on any output channel. Independent research demonstrated data exfiltration from the Claude platform via the Anthropic Files API confirming output channels can be weaponized by injected prompts [6].
Key Monitoring Risks
The vendor enterprise admin guide confirms audit logs and the Compliance API do not cover Cowork activity in its current release. Autonomous scheduled tasks and connector operations execute without generating any security-relevant telemetry, leaving the operator blind to lateral movement or data exfiltration during unattended sessions [9].

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Claude Cowork carries moderate defense credit relative to broad capability exposure and near-maximum blast across five of six impact factors.

AIRQ Metrics

Claude Cowork places in the Exposed Giants quadrant because its unsandboxed execution and ten input surfaces drive attack exposure above the midline while the absence of deployment pipeline access keeps blast radius below the top-tier threshold.

The four scores below summarize the agent's risk posture on its documented default configuration.

Metric Score Comments
AIRQ Score 4.19 Moderate composite score reflecting limited defense mitigation against broad exposure.
Blast Radius 6.75 / 10 Near-maximum reach across five factors with only deployment access scoring below ceiling.
Attack Surface 5.64 / 10 Consistent band-3 scoring across ten surfaces driven by multiple unvalidated input channels.
Defense Controls 2 / 15 Minimal controls: per-app prompts, deletion protection, and computer-use classifiers are the only documented defaults.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. Ten input channels score at band 3 reflecting multiple unvalidated data sources and the absence of per-channel input shields beyond the computer-use prompt injection classifier.

Attack Surface Metrics

Scores reflect the default posture where computer use runs without sandbox and connectors hold full OAuth scope.

Each surface is scored against observable architectural conditions documented in vendor safety guidance and confirmed by independent security research.

Surface Score Comments
User Input 3 / 4 Multiple unvalidated input channels including screen content during computer use, browser text via Chrome extension, and typed prompts with prompt injection classifiers as the sole automated defense covering only screenshot inputs [7][4].
External Data 3 / 4 Ingests connector data from Slack, Google Workspace, and Microsoft 365 plus MCP server outputs and plugin responses without a dedicated prompt shield on any non-computer-use channel [8][5].
Memory 2 / 4 Project-scoped memory persists as unverified markdown files on disk with no integrity verification documented, though scope is limited to project boundaries and standalone sessions do not retain state [8].
Reasoning 2 / 4 Standard transformer reasoning with documented resistance training under ASL-3 deployment safety evaluations including prompt injection resistance testing; academic research measured attack success rates but full bypass under default constraints was not confirmed [10][5].
Planning 3 / 4 Autonomous multi-step task decomposition with scheduled execution running without per-action approval; vendor safety documentation warns that long-running tasks can compound errors without human checkpoints [8][5].
Tool Execution 3 / 4 Full shell access plus browser automation via computer use runs directly on the operator's desktop with no sandbox or container boundary; vendor explicitly documents the absence of execution isolation for this capability [7][4].
Orchestration 3 / 4 Plugins bundle sub-agents and connectors while MCP servers extend capabilities at runtime; the desktop app must remain open for task execution creating a persistent surface with no per-plugin security review [7][8].
Inter-Agent 3 / 4 Each MCP server and plugin introduces attack paths that inherit the agent's full permission scope; vendor warns that third-party integrations expand the attack surface without centralized vetting [8][13].
Output Processing 3 / 4 Connector write operations, browser form submissions, and file outputs proceed with no DLP or exfiltration blocking; the Oasis Security research chain confirmed that platform output channels can be silently weaponized by injected prompt payloads [8][6].
Configuration 3 / 4 Plugins and MCP marketplace expand agent capabilities without per-tool security review; connector OAuth grants persist indefinitely with no documented rotation or scope reduction policy [8][12].

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Claude Cowork reads untrusted web content and connector data, accesses private files and OAuth-scoped enterprise resources, and writes to external services — all within the same default session.

Lethal Trifecta · Complete (3 of 3)

Claude Cowork exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Ingests content from browser pages, MCP servers, connectors, and local files with no dedicated prompt shield beyond computer-use classifiers [7].
  • Sensitive data — Reads private files in approved folders, OAuth-scoped enterprise data from connectors, and authenticated browser content during computer use sessions [9].
  • External egress — Sends bytes externally via web search, connector writes, browser form submissions, and scheduled task outputs with no DLP or exfiltration controls [8].

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Five of six blast factors score at the architectural maximum reflecting user-level access to code execution, file system, network, credentials, and autonomous action on the operator's desktop.

Blast Radius Metrics

Scores reflect the impact scope when an injected prompt gains the agent's desktop execution context on the operator's machine.

Each factor is scored against the scope of access the agent holds by default when operating under its documented configuration.

Factor Score Comments
Code execution 3 / 4 Computer use grants user-level shell access on the operator's actual desktop; commands run with the full privileges of the logged-in user including package installation and service management [7].
File system access 3 / 4 Read and write access across all user-approved workspace folders which may include home directories containing credentials, source repositories, and personal documents [8].
Network access 3 / 4 Unrestricted outbound network during computer use sessions: browser navigation to arbitrary URLs, web search, and connector API calls proceed without egress controls or domain allowlisting [7].
Credential access 3 / 4 Connector OAuth grants inherit the operator's full permissions in connected services; Slack message posting, Google Workspace document editing, and Microsoft 365 email all operate at the user's full scope [9].
Autonomous action 3 / 4 Scheduled tasks and mobile-dispatched operations execute on the operator's desktop without per-action human approval, enabling autonomous multi-step workflows across all available tools and connectors [8].
Deployment access 1 / 4 No documented CI/CD integration, infrastructure provisioning, or deployment pipeline access in the default Cowork configuration; blast radius is limited to the local desktop environment [12].

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. The default configuration provides minimal automated defense: only per-app permission prompts and computer-use prompt injection classifiers exist, with no sandbox, no DLP, and no audit coverage for Cowork sessions.

Defense Controls Metrics

The runtime provides prompt injection classifiers on screenshot inputs only; all other defense surfaces depend on operator-added tooling or third-party controls.

Each component is scored against what the agent's runtime does to detect, contain, or report adversarial activity on its documented default posture.

Component Score Comments
Input Guardrails 1 / 3 Prompt injection classifiers run on computer-use screenshots only with no dedicated prompt shield documented for connector inputs, MCP server outputs, or local file content ingested during non-computer-use sessions [7][10].
Execution Isolation 0 / 3 The agent's computer use capability operates without sandbox, container, or VM isolation on the operator's actual machine; vendor explicitly documents this as a research preview limitation and sandbox-runtime vulnerabilities in the sibling Claude Code product confirm the engineering difficulty of retrofitting isolation to this architecture [7][1][2][3].
Action Controls 1 / 3 Per-app permission prompts gate computer use application access and deletion protection requires explicit approval, but scheduled tasks and connector operations bypass per-action approval entirely once OAuth is granted; organizational certifications exist at the compliance level but do not translate to runtime enforcement [8][9][11].
Output Guardrails 0 / 3 No Cowork-native DLP, exfiltration blocking, or URL sanitization exists for any output channel; connector writes and browser submissions flow directly to external services without agent-level content inspection [8][6].
Monitoring 0 / 3 Audit logs and Compliance API explicitly do not cover Cowork activity per the enterprise admin guide; the default posture provides basic file-based session logs on the local machine but no active alerting or SIEM integration [9][12].

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators can materially reduce exposure by layering isolation, scoping permissions, and adding monitoring that the default configuration does not provide.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require all MCP servers to pass a security review before organizational deployment — counters unvetted plugin attack surface expansion.
  • Configuration Deploy a third-party prompt injection detection layer on connector inputs and file ingestion channels — counters the missing prompt shield for non-computer-use inputs.
  • Engineering Implement input sanitization for content processed from untrusted web pages during computer use — counters the browser-based indirect injection vector.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Provision a dedicated virtual machine or container for Cowork computer use sessions — counters the documented absence of any sandbox boundary.
  • Configuration Restrict computer use to a locked-down user account with minimal filesystem and network privileges — counters user-level privilege inheritance.
  • Engineering Deploy process allowlisting via endpoint protection rules limiting which binaries can execute during computer use sessions — counters the unrestricted application access surface.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Disable scheduled task execution and require manual approval for all autonomous operations — counters per-action approval bypass on time-triggered workflows.
  • Configuration Implement scope-limited OAuth grants for connectors with read-only defaults and explicit write approval — counters full-permission inheritance across connected services.
  • Engineering Configure the application block list to deny access to all sensitive applications beyond the default investment restriction — counters broad application access during computer use.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Deploy a DLP proxy on all outbound channels including connector writes and browser submissions — counters the complete absence of exfiltration blocking.
  • Configuration Implement URL allowlisting for browser navigation during computer use sessions — counters unrestricted outbound network access enabling data exfiltration.
  • Engineering Enable content inspection on connector write operations before messages reach external services — counters connector-inherited permissions allowing silent data posting.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Forward Cowork session activity to an enterprise SIEM using local log file collection — counters the documented audit gap for autonomous operations.
  • Configuration Implement file integrity monitoring on workspace folders to detect unauthorized modifications — counters the absence of change detection for file operations.
  • Engineering Deploy anomaly detection on connector API call patterns to identify unusual data access — counters the absence of behavioral alerting across connected services.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2025-66479 Anthropic sandbox-runtime network isolation failure when no allowed domains configured allowing unrestricted outbound requests from sandboxed code; patched v0.0.16
  2. GHSA-9gqj-5w7c-vx47 GitHub Security Advisory for protection mechanism failure in Anthropic sandbox-runtime affecting network isolation
  3. Claude Code Sandbox Bypass SecurityWeek on two sandbox bypass vulns including SOCKS5 null-byte injection in Claude Code v2.0.24 through v2.1.89

Selected Research

  1. Indirect Prompt Injection of Claude Computer Use HiddenLayer demonstrates indirect prompt injection via obfuscated PDF causing Claude Computer Use to execute rm -rf on host
  2. SUDO Jailbreaking Computer-Use Agents Academic Detox2tox attack strategies against Claude Computer Use with 50-task benchmark measuring destructive-action success rates
  3. Claudy Day Prompt Injection and Exfiltration Oasis Security demonstrates invisible prompt injection plus data exfiltration via Anthropic Files API on claude.ai

Vendor Documentation

  1. Computer Use Safety Vendor documents computer use runs outside any sandbox on user desktop with per-app permission prompts
  2. Use Claude Cowork Safely Vendor safety guidance on prompt injection via web content and MCP/plugin attack surface expansion
  3. Claude Cowork Enterprise Admin Guide Enterprise docs confirming audit logs do not cover Cowork and connector OAuth inherits user permissions
  4. Claude Opus 4 and Sonnet 4 System Card Anthropic system card with agentic safety evaluations for computer use and prompt injection resistance testing
  5. Anthropic Certifications SOC 2 Type I/II, ISO 27001:2022, ISO/IEC 42001:2023, HIPAA-ready configuration confirmed

Other Sources

  1. Claude Cowork Enterprise Security Guide Third-party assessment identifying ten risk categories including connector blast radius and incomplete audit coverage
  2. Trustworthy Agents in Practice Anthropic five-principle framework for agent safety acknowledging prompt injection defenses are not guaranteed