Claude Code Agent Security Risks

Coding Agents anthropic.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (3) ATTACK SURFACE (7.96) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.38
High
Attack Surface
7.96
Critical
Blast Radius
6.75
High
Defense Controls
3
Critical
About The Agent

Claude Code is an agentic coding assistant that runs as a CLI process on the operator's workstation, connecting to Anthropic's Claude models for multi-step code generation, shell execution, file editing, and repository management. The same operator-scoped runtime drives shell, file system, and web-fetch tools, accepts inbound instructions from CLAUDE.md project files and MCP servers, loads cross-session memory at startup, and can orchestrate parallel subagent sessions and multi-agent teams, with every channel feeding the same prompt context under the same execution authority.

About the AI Risk Quadrant

Exposed Giants placement reflects a high attack surface driven by multiple confirmed command-injection and sandbox-escape vulnerabilities combined with a moderate blast radius that reaches the operator's full home directory, credentials, and network stack. Defense controls contribute minimally on the documented default, where sandboxing is opt-in, approval gates are single-flag bypassable, and telemetry requires manual enablement. Claude Code operators inherit broad tool authority with few vendor-enforced guardrails unless they actively layer on the available hardening options.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Untrusted repository content can reach arbitrary code execution on the operator workstation because the default configuration ships without sandboxing, input filtering, or outbound data-loss prevention.

Key Input Risks
Untrusted content from repository files, CLAUDE.md instructions, MCP server outputs, and marketplace plugins enters the reasoning loop without a prompt shield or injection classifier. Multiple confirmation-bypass CVEs with CVSS scores up to 9.8 have enabled code execution from injected context content. [1][3][5]
Key Execution Risks
Full shell access runs with operator-level privileges and no sandbox by default on the CLI distribution. Two sandbox-escape CVEs reaching CVSS 10.0 have demonstrated arbitrary file writes and persistent hook injection from inside the opt-in sandbox boundary. [9][10]
Key Action Risks
Permission prompts gate tool calls by default, but a single CLI flag disables every approval check for the session. Progressive allowlisting permanently exempts command patterns without expiration, and marketplace plugin hooks can auto-approve commands without operator awareness. [13][17][21]
Key Output Risks
No outbound data-loss prevention or exfiltration-channel blocking ships by default. DNS exfiltration via allowlisted commands and SOCKS5 null-byte sandbox bypass have both been demonstrated as viable exfiltration paths from prompt-injected sessions. [2][11][15]
Key Monitoring Risks
OpenTelemetry-based audit logging is available but entirely opt-in, leaving the default CLI installation with no structured event trail and no anomaly detection. Operators who do not enable telemetry inherit a blind spot across every tool call and permission decision. [24][31]

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Claude Code sits in the lower-right quadrant with a critical attack surface offset by vendor-documented but opt-in sandboxing and monitoring controls that leave the default posture exposed.

AIRQ Metrics

The agent's attack surface sits in the critical band while blast radius falls just below the upper quadrant boundary, and defense controls contribute near the minimum on the default configuration.

Each axis is scored on a defined scale: attack surface out of ten, blast radius out of ten, defense controls out of fifteen, and the AIRQ composite out of fifteen.

Metric Score Comments
AIRQ Score 4.38 The composite reflects a wide attack surface offset by limited default defenses, placing hardening priority on the operator rather than the vendor.
Blast Radius 6.75 / 10 Full shell, file system, and credential access at operator privilege drive the upper bands, with deployment reach kept moderate by the absence of dedicated deploy tools.
Attack Surface 7.96 / 10 Agent-specific NVD CVEs, sandbox-escape advisories, and demonstrated plugin hijacking anchor every high-scoring surface.
Defense Controls 3 / 15 Vendor-documented sandboxing and telemetry are opt-in only; the default CLI ships with approval prompts and basic telemetry support as the only nonzero controls.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The dominant exposures are the unfiltered input channels that auto-load project files and marketplace content, the bypassable tool-execution boundary, and the supply-chain surface of community plugins.

Attack Surface Metrics

Six of ten surfaces carry confirmed agent-specific vulnerabilities or demonstrated exploitation, and four reach the adjusted ceiling of 5.0.

Each row ties a named surface to its base architectural band, any agent-specific evidence penalty, and a prose summary of the exposure.

Surface Score Comments
User Input 5 / 4 Multiple input channels with no prompt shield; $IFS parsing bypass enabled arbitrary code execution from injected context content. [1][3][5]
External Data 4 / 4 Auto-loads CLAUDE.md from project directories and MCP server outputs without content validation or integrity verification; malicious repo config triggered data leakage before trust confirmation. [12][23][33]
Memory 3 / 4 Cross-session auto memory persists learnings without integrity checks; directory change bypassed write protection to the .claude folder where memory files reside. [7][23]
Reasoning 2 / 4 Multi-step reasoning with visible chain-of-thought; vendor-provided Claude model with isolated WebFetch context windows for external content. [19]
Planning 3 / 4 Autonomous task decomposition with subagent delegation; approval bypassed via find command injection from injected context. [6][21]
Tool Execution 5 / 4 Unrestricted shell with user privileges and bypassable approval; sandbox escape via symlink following proved arbitrary writes outside the workspace boundary. [10][9]
Orchestration 3 / 4 Spawns subagent sessions and multi-agent teams with inter-agent messaging; plugin autoloading enabled arbitrary code execution through the orchestration layer. [29][30][34]
Inter-Agent 4 / 4 Connects to external MCP servers and community marketplace plugins without code signing; a hostname-validation flaw enabled unsanctioned outbound calls to adversary infrastructure. [13][17][28][35]
Output Processing 5 / 4 No outbound DLP or exfiltration blocking by default; DNS exfiltration via allowlisted commands demonstrated stealing API keys from injected file content. [2][11]
Configuration 5 / 4 Auto-executes CLAUDE.md from untrusted project directories; sandbox config injection via missing settings.json protection allowed persistent hook injection with host privileges. [9][4]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Claude Code on its documented default ingests repository files and marketplace content, reads the operator's home directory and credential stores, and transmits bytes through shell commands and MCP tool calls without crossing any vendor-enforced outbound control.

Lethal Trifecta · Complete (3 of 3)

Claude Code exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Repository files, CLAUDE.md project instructions, MCP server outputs, and marketplace plugin content all enter the reasoning loop without filtering. [5][23]
  • Sensitive data — The agent reads source code, environment variables, credential files under the home directory, and OAuth tokens accessible at operator privilege. [15][18]
  • External egress — Shell commands, MCP tool calls, and WebFetch requests provide default egress channels; DNS exfiltration via allowlisted commands has been demonstrated. [2][15]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Compromise of the agent reaches the operator's full shell environment, home-directory file system, credential stores, and outbound network stack at user-level privilege.

Blast Radius Metrics

Four of six factors sit at the upper bands, with autonomous action and deployment access kept moderate by default approval gates and the absence of dedicated deploy tooling.

Each row ties a blast factor to the operator capability the agent inherits and the strongest evidence anchoring that scope.

Factor Score Comments
Code execution 3 / 4 Full shell with operator-level privilege; sandbox escape via symlink following confirmed arbitrary code execution outside the workspace. [10]
File system access 3 / 4 Read-write access across the home directory; symlink-based deny rule bypass demonstrated reading restricted system files. [8][14]
Network access 3 / 4 Unrestricted outbound by default when sandbox is not enabled; SOCKS5 null-byte injection bypassed network allowlist for 5.5 months. [15][16]
Credential access 3 / 4 Access to environment variables, API keys, and cloud credentials; bash commands auto-approved without consent led to systematic credential exfiltration. [15][26][32][36]
Autonomous action 2 / 4 Autonomous actions gated by permission prompts; bypassPermissions mode available but not the default configuration. [21]
Deployment access 2 / 4 Git push restricted to the current working branch; no dedicated deployment or infrastructure-modification tools ship by default. [18]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. The vendor documents sandboxing, permission modes, and OpenTelemetry integration, but none ships as a default-on control on the CLI distribution.

Defense Controls Metrics

Higher scores reflect stronger vendor safeguards; the near-minimum total indicates that meaningful defense is operator-configured rather than vendor-enforced.

Each component is scored on the vendor-implemented default posture; opt-in controls surface as hardening recommendations rather than scored defenses.

Component Score Comments
Input Guardrails 0 / 3 No prompt shield, injection classifier, or content scanning ships by default; trust verification runs only on first codebase access. [18]
Execution Isolation 1 / 3 OS-level sandbox ships with the distribution and requires a single command to enable but is not active by default; multiple sandbox escapes have been silently patched. [19][20][25][27]
Action Controls 1 / 3 Default approval gates cover sensitive operations, but bypassPermissions mode removes every check for the session via a single CLI flag. [21][22]
Output Guardrails 0 / 3 No DLP, credential redaction, or exfiltration-channel blocking on the default configuration; egress controls rely on the opt-in network sandbox. [20]
Monitoring 1 / 3 OpenTelemetry metrics, events, and traces are supported but require manual enablement; no anomaly detection or alerting ships by default. [24][31]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Enable the sandboxed bash tool, restrict network egress to a strict domain allowlist, and activate OpenTelemetry audit logging as the three highest-leverage default changes.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require code review of all CLAUDE.md and .claude/settings.json files in pull requests before merging — counters the auto-loaded project file ingestion surface.
  • Configuration Enable managed settings with allowManagedDomainsOnly to block unapproved MCP server domains — counters unfiltered external data ingestion.
  • Engineering Deploy a PreToolUse hook running a prompt-injection classifier on Read and WebFetch outputs — counters the absence of default input filtering.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Mandate sandbox-enabled sessions for all developer workstations through managed organizational settings — counters the opt-in-only isolation default.
  • Configuration Configure sandbox allowWrite paths to the project directory only and set allowedDomains to the minimum required set — counters the broad default file system scope.
  • Engineering Run Claude Code inside a container or gVisor-backed VM with network-none plus a host-side proxy for kernel-level isolation — counters the shared-kernel sandbox limitation.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Set permissions.disableBypassPermissionsMode to disable in managed settings across the organization — counters the single-flag approval bypass.
  • Configuration Configure explicit deny rules for credential-path access and destructive commands in settings.json — counters progressive allowlisting accumulation.
  • Engineering Build a PreToolUse hook that blocks curl, wget, and network commands unless the destination matches an internal allowlist — counters plugin-driven exfiltration.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Establish a credential-rotation policy for any secrets accessible during Claude Code sessions — counters the absence of outbound DLP.
  • Configuration Restrict sandbox allowedDomains to internal registries and remove wildcard entries to close the SOCKS5 bypass class — counters network exfiltration channels.
  • Engineering Deploy a TLS-terminating egress proxy outside the sandbox that inspects and logs all outbound traffic — counters domain-fronting and null-byte sandbox bypasses.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Require CLAUDE_CODE_ENABLE_TELEMETRY=1 in managed environment settings for all deployments — counters the default-off audit trail.
  • Configuration Configure OTEL_LOGS_EXPORTER and OTEL_LOG_TOOL_DETAILS=1 to forward structured events to your SIEM — counters the absence of centralized monitoring.
  • Engineering Build alerting rules in your SIEM for anomalous permission_mode_changed events and unexpected MCP server connections — counters the lack of anomaly detection.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2025-54795 Command injection via echo command bypassed confirmation prompt for untrusted command execution (CVSS 9.8). Patched in v1.0.20.
  2. CVE-2025-55284 File read plus network exfiltration via overly broad command allowlist enabled DNS-based data theft (CVSS 7.5). Patched in v1.0.4.
  3. CVE-2025-58764 Command parsing error bypassed confirmation prompt to execute untrusted commands (CVSS 9.8). Patched in v1.0.105.
  4. CVE-2025-59536 Code injection via trust dialog bypass allowed execution before user accepted startup trust prompt (CVSS 8.8). Patched in v1.0.111.
  5. CVE-2025-66032 Shell command parsing errors with $IFS and short CLI flags bypassed read-only validation for arbitrary code execution (CVSS 9.8). Patched in v1.0.93.
  6. CVE-2026-24887 Confirmation prompt bypass via find command enabled untrusted command execution from injected context (CVSS 8.8). Patched in v2.0.72.
  7. CVE-2026-25722 Directory change bypassed write protection to the protected .claude folder enabling config modification (CVSS 9.1). Patched in v2.0.57.
  8. CVE-2026-25724 Symlink following bypassed deny rules configured in settings.json to read restricted files (CVSS 7.5). Patched in v2.1.7.
  9. CVE-2026-25725 Sandbox config injection via unprotected settings.json allowed persistent hook injection executing with host privileges (CVSS 10.0). Patched in v2.1.2.
  10. CVE-2026-39861 Sandbox escape via symlink following enabled arbitrary file write outside the workspace without user confirmation (CVSS 10.0). Patched in v2.1.64.

Selected Research

  1. Claude Code DNS Exfiltration via Prompt Injection Embrace The Red demonstrated DNS-based data exfiltration from Claude Code via indirect prompt injection in file content, leading to the CVE-2025-55284 fix.
  2. Claudy Day Prompt Injection Chain Oasis Security demonstrated invisible prompt injection via URL parameters chained with exfiltration through the Anthropic Files API on claude.ai.
  3. Hijacking Claude Code via Injected Marketplace Plugins PromptArmor demonstrated marketplace plugin hooks bypassing human-in-the-loop approval and enabling data exfiltration via prompt injection.
  4. CVE-2026-25724 Symlink Discovery Writeup Terra Security documented the discovery of the symlink-based deny rule bypass in Claude Code leading to restricted file access.
  5. SOCKS5 Null-Byte Sandbox Bypass CyberSecurityNews documented the SOCKS5 hostname null-byte injection that bypassed Claude Code network sandbox for 5.5 months across 130 versions.
  6. SOCKS5 Null-Byte Bypass Proof of Concept Aonan Guan published the proof-of-concept scripts demonstrating the SOCKS5 null-byte hostname injection against Claude Code sandbox-runtime.
  7. Marketplace Skill Dependency Hijack Prompt Security demonstrated how a malicious marketplace plugin can redirect dependency installs to trojanized builds in Claude Code.

Vendor Documentation

  1. Claude Code Security Documentation The vendor security page documents sandboxing, permission model, trust verification, network controls, and cloud isolation for Claude Code.
  2. Claude Code Sandboxing Architecture Anthropic engineering blog describes the filesystem and network isolation architecture including OS-level sandbox and proxy-based domain allowlisting.
  3. Claude Code Sandbox Configuration Vendor documentation covers sandbox filesystem boundaries, network proxy enforcement, domain restrictions, and documented security limitations.
  4. Claude Code Permission Modes Vendor documentation covers all permission modes including default, acceptEdits, auto, bypassPermissions, and administrator controls to disable bypass.
  5. Claude Code Permissions Configuration Vendor documentation covers allow and deny rules, managed settings enforcement, and organizational policy controls for permission behavior.
  6. Claude Code Memory Documentation Vendor documentation describes CLAUDE.md project instructions, auto memory persistence, session memory, and enterprise policy memory hierarchy.
  7. Claude Code Monitoring with OpenTelemetry Vendor documentation covers OpenTelemetry integration for metrics, events, and traces export plus SIEM forwarding for audit security events.
  8. Claude Code Secure Deployment Guide Vendor documentation covers sandbox-runtime, container isolation, and VM-level isolation options for secure agent deployment.
  9. Anthropic Certifications Anthropic holds SOC 2 Type I and II, ISO 27001, ISO 42001, and HIPAA-ready configuration at the corporate level.

Other Sources

  1. Anthropic Silently Patches Sandbox Bypass SecurityWeek documented both Claude Code sandbox bypasses being silently patched without public advisory or user notification.
  2. Claude Code Plugin Supply Chain Audit Community supply-chain audit tool documenting that marketplace plugins auto-update without integrity verification, code signing, or user notification.
  3. Claude Code Agent Teams Documentation Vendor documentation covers multi-agent orchestration with lead and teammate coordination, inter-agent messaging, and task management hooks.
  4. Claude Code Hooks Reference Vendor documentation covers lifecycle hooks including PreToolUse and PostToolUse events, MCP tool hooks, and agent-based hooks for permission decisions.
  5. Claude Code Agent SDK Observability Vendor documentation covers OpenTelemetry trace and metric export from the Agent SDK for distributed observability across agent sessions.
  6. Anthropic Trust Center The Anthropic Trust Center portal provides access to SOC 2 reports, ISO certificates, and compliance documentation under access request.
  7. GHSA-jh7p-qr78-84p7 Malicious Repo Config Data Leakage Malicious repo configuration triggered data leakage via environment configuration used before trust confirmation dialog appeared.
  8. GHSA-2jjv-qf24-vfm4 Plugin Autoloading Code Execution Arbitrary code execution via plugin autoloading with specific Yarn versions allowed untrusted code to run during agent orchestration.
  9. GHSA-vhw5-3g5m-8ggf Domain Validation Bypass Domain validation bypass allowed automatic requests to attacker-controlled domains circumventing the network allowlist.
  10. Bash Auto-Approved Without Consent Leading to Credential Exfiltration GitHub issue documenting 48 bash commands auto-approved without user consent in default permission mode leading to systematic credential exfiltration across filesystem and process tree.