OpenClaw Agent Security Risks

Computer Agents openclaw.ai Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (1) ATTACK SURFACE (8.58) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.24
High
Attack Surface
8.58
Critical
Blast Radius
9.25
Critical
Defense Controls
1
Critical
About The Agent

OpenClaw is a self-hosted autonomous computer agent that runs as a persistent gateway on operator-owned hardware. The same operator-scoped runtime drives shell execution, file system access, browser automation, and web-fetch tools, accepts inbound prompts from a wide range of messaging clients, auto-loads cross-session memory at startup, and executes community-published skills from an open marketplace — every channel feeds the same prompt context with the same execution authority at user privilege.

About the AI Risk Quadrant

Exposed Giants describes agents with an extremely high attack surface combined with a high blast radius and near-absent vendor-shipped defense controls on the documented default configuration. OpenClaw inherits this placement because host execution is intentionally unrestricted with opt-in sandboxing only, the skills marketplace has sustained large-scale supply chain incidents, and memory survives restarts without integrity checks while rich messaging outputs lack outbound data-loss controls.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. OpenClaw concentrates risk in the combination of unrestricted host execution authority, absent default sandboxing, and no vendor-shipped observability — every compromise path runs to completion without detection.

Key Input Risks
The gateway accepts untrusted content from messaging channels and marketplace skills without vendor-enforced input filtering. CVE-2026-25253 (CVSS 8.8) proved the control UI itself is an injection vector via WebSocket hijacking. Operators should deploy an external prompt-injection classifier or disable marketplace skill auto-loading.
Key Execution Risks
Tool execution runs on the host at user privilege with YOLO as the default exec mode and sandboxing opt-in but OFF. CVE-2026-32973 (CVSS 9.8) demonstrated arbitrary command execution via allowlist bypass through POSIX path normalization. The documented isolation tier is absent unless the operator explicitly enables Docker sandbox.
Key Action Risks
Autonomous actions fire via cron daemon and subagent spawning without per-action operator approval on the documented default. The agent holds full file-system read-write, credential access via OS keychain integration, and unrestricted outbound network scope — all exercisable without approval gates.
Key Output Risks
The web-fetch tool has a demonstrated SSRF bypass (CVE-2026-6011) enabling requests to internal services. Output deserialization trusts upstream content without DLP or redaction, and inter-agent messages pass without content validation through the same channel that GHSA-hr5v-j9h9-xjhg proved escapable.
Key Monitoring Risks
No audit logging, SIEM integration, or anomaly detection ships on the documented default. Operators should prioritize forwarding gateway tool-invocation events to a SIEM as the first observability step — without this, credential access and file mutations proceed without any recorded trail.

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. OpenClaw lands in the most dangerous composite band where operator hardening is not optional — the agent's own controls contribute almost nothing to incident containment.

AIRQ Metrics

The Exposed Giants placement means operators cannot safely deploy this agent without first layering external controls — the vendor's own architecture contributes almost no incident containment on the documented default.

Attack Surface is scored out of ten, Blast Radius out of ten, Defense Controls out of fifteen, and the AIRQ composite out of fifteen.

Metric Score Comments
AIRQ Score 4.24 Places in the critical band where the combination of exposure and blast overwhelms the near-absent defense posture — operator-layered controls are the only path to containment.
Blast Radius 9.25 / 10 Five of six factors at maximum reflect unrestricted host execution, full credential access, and autonomous actions that fire without approval gates.
Attack Surface 8.58 / 10 Driven by agent-specific NVD CVEs and vendor advisories across the majority of surfaces with all three trifecta conditions independently confirmed.
Defense Controls 1 / 15 Near-zero because only the exec approval system provides any vendor-shipped control — all other defense components have nothing in place by default.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The dominant exposures are unfiltered multi-channel input ingestion, auto-loaded persistent memory, and a community skill marketplace where installed code inherits full tool authority.

Attack Surface Metrics

Higher scores indicate wider attacker-controllable ingestion with less vendor-enforced filtering between the input channel and the reasoning loop.

Each row ties a canonical input surface to its strongest agent-specific evidence anchor and the resulting adjusted score after any evidence penalty.

Surface Score Comments
User Input 5 / 4 WebSocket hijacking via crafted gatewayUrl (CVE-2026-25253, CVSS 8.8) proves the control UI accepts attacker-controlled bytes that leak the authentication token to external endpoints. [1]
External Data 4 / 4 The gateway ingests content from messaging channels and fetched web pages without vendor-enforced sanitization on the documented default trust model. [15]
Memory 5 / 4 Persistent prompt injection through the LanceDB memory system demonstrated end-to-end via auto-capture, direct insertion, and unsanitized recall into the reasoning context. [14]
Reasoning 3 / 4 The architecture delegates all reasoning to external LLMs without any intermediate content filter between the prompt and the model response. [15]
Planning 3 / 4 Autonomous task decomposition with cron daemon and subagent spawning operates without plan-level approval or scope-limiting gates on the default configuration. [16]
Tool Execution 5 / 4 Exec allowlist bypass via POSIX path normalization (CVE-2026-32973, CVSS 9.8) proved arbitrary command execution outside operator-configured restrictions. [4]
Orchestration 4 / 4 Plugin subagent routes bypass gateway authorization with synthetic admin scopes, enabling unauthenticated access to privileged runtime methods. [8]
Inter-Agent 4 / 4 Sandbox media root bypass via unnormalized parameter keys (GHSA-hr5v-j9h9-xjhg) allows cross-agent file reads from other workspaces without isolation enforcement. [7]
Output Processing 4 / 4 SSRF in the web-fetch assertPublicHostname handler (CVE-2026-6011) enables server-side requests to internal services bypassing the hostname validation. [5]
Configuration 5 / 4 Config denylist bypass (CVE-2026-45006) enables compromised models to write persistent policy overrides that alter exec allowlists and credential handling across gateway restarts. [6]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. OpenClaw exhibits all three on the documented default: a single injected instruction can read file-system content plus OS keychain credentials and transmit them through messaging channels, web-fetch, or shell commands without crossing any vendor-enforced control.

Lethal Trifecta · Complete (3 of 3)

OpenClaw exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — The gateway ingests content from marketplace skills, fetched web pages, and messaging channels — all authored by parties other than the operator. [1]
  • Sensitive data — The agent reads the full file system at user privilege, source code repositories, and OS keychain credentials accessible to the operator's shell session. [15]
  • External egress — Outbound HTTP from web-fetch, shell commands capable of network calls, and inter-agent message passing all constitute default channels that send bytes externally. [5]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Compromise of the agent equals compromise of the operator's host — unrestricted code execution, full credential scope, and autonomous actions without approval gates.

Blast Radius Metrics

Higher blast scores indicate wider operator-asset reachability from a single successful injection into the agent's reasoning loop.

Each row ties a blast factor to the specific workflow node or credential scope that determines the maximum damage from a compromised session.

Factor Score Comments
Code execution 4 / 4 Exec allowlist bypass (CVE-2026-32973, CVSS 9.8) grants arbitrary shell execution at user privilege with no sandbox enforced by default. [4]
File system access 4 / 4 Sandbox media root bypass (GHSA-hr5v-j9h9-xjhg) demonstrates full cross-workspace file reads via unnormalized parameter keys escaping isolation. [7]
Network access 4 / 4 SSRF bypass in web-fetch (CVE-2026-6011) proves unrestricted outbound network access from the tool layer on the default configuration. [5]
Credential access 4 / 4 OS command injection in the macOS keychain credential refresh path (CVE-2026-27487, CVSS 8.0) proves access to stored OAuth tokens. [2]
Autonomous action 4 / 4 The default exec mode allows all tool calls to proceed without per-invocation operator approval and the background daemon runs scheduled tasks continuously. [15]
Deployment access 2 / 4 The agent can build and deploy Docker containers from workspace content, reaching operator infrastructure when Docker is available on the host. [16]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. The vendor publishes security documentation and an exec approval system but ships no active guardrails on the default configuration — operators inherit all detection and containment responsibility.

Defense Controls Metrics

Higher defense scores indicate stronger vendor-shipped safeguards that reduce risk without operator action — the inverted coloring reflects that absence is the dangerous state.

Each row scores whether the vendor implements the control by default or whether the operator must build and maintain it independently.

Component Score Comments
Input Guardrails 0 / 3 No prompt-injection filter, content classifier, or input sanitization ships on the default configuration; academic red-team assessment confirmed only a 17 percent native defense rate. [15][10]
Execution Isolation 0 / 3 Sandboxing is documented but opt-in and OFF by default; the exec runtime operates directly on operator hardware without containerization. [16]
Action Controls 1 / 3 The exec approval system provides YOLO preset, progressive allowlist, and session-level bypass — the system exists but the permissive default (YOLO) negates its value. [17]
Output Guardrails 0 / 3 No DLP, output redaction, or URL sanitization is documented or shipped on the default configuration; outbound content passes unchecked. [15]
Monitoring 0 / 3 The vendor publishes no monitoring integration, event forwarding, or detection capability — operators must build all observability from scratch using external tooling. [15]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize breaking the trifecta condition by gating autonomous execution and deploying input filtering before addressing observability gaps.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require manual review of all marketplace skills before installation — counters the supply chain attack surface where malicious skills were confirmed at registry scale. [12][19]
  • Configuration Deploy a content-filtering proxy between the gateway and messaging channels — counters the unfiltered multi-channel input ingestion that feeds directly into the reasoning loop.
  • Engineering Wire a prompt-injection classifier into the gateway ingestion pipeline — counters the demonstrated persistent injection via LanceDB memory auto-capture.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Mandate that all deployments enable the Docker sandbox backend before first use — counters the host-resident execution surface where tools run at user privilege by default.
  • Configuration Enable Docker sandbox backend and restrict workspace access mode to read-only — counters the file-system blast radius demonstrated by the sandbox media root bypass.
  • Engineering Implement gVisor or Firecracker micro-VM isolation for the exec runtime — counters the allowlist bypass that grants arbitrary shell at user privilege. [3][13]

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Disable YOLO mode and enforce per-action approval for all destructive operations — counters the autonomous cron and subagent execution that fires without operator consent.
  • Configuration Set the exec approval mode to require explicit per-command opt-in instead of the default blanket-allow preset — counters the session-level bypass that accumulates permissions silently.
  • Engineering Build a policy engine that gates tool invocation on runtime context signals — counters the credential access path that was reachable without approval. [9][11]

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Deploy a DLP proxy on all outbound channels before external transmission — counters the unrestricted egress proven by the SSRF bypass in web-fetch. [18]
  • Configuration Implement URL sanitization and response validation on the web-fetch tool — counters the assertPublicHostname bypass that enabled internal-service requests.
  • Engineering Add output-content scanning for sensitive data patterns before inter-agent messaging — counters the cross-agent file-read surface where sandbox boundaries were escapable.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Forward all gateway events to a SIEM with retention and alerting rules — counters the complete absence of audit logging on the default configuration.
  • Configuration Implement anomaly detection on tool invocation patterns and credential access frequency — counters the silent credential exfiltration path demonstrated by the keychain injection.
  • Engineering Instrument the LanceDB memory system with tamper-detection hooks — counters the persistent prompt injection chain that silently poisons future sessions without triggering any alert. [14]

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2026-25253 One-click RCE via WebSocket hijacking (CVSS 8.8). Control UI accepted a crafted gatewayUrl from the query string and automatically established a WebSocket connection without validation, leaking the stored authentication token to an attacker-controlled endpoint. Patched in 2026.1.29.
  2. CVE-2026-27487 OS command injection in macOS keychain credential refresh (CVSS 8.0). The CLI constructed a shell command using user-controlled OAuth token data to write into Keychain creating an injection vector. Patched in 2026.2.14.
  3. CVE-2026-28479 Sandbox cache poisoning via deprecated SHA-1 hash (CVSS 9.1). Docker and browser sandbox configuration cache keys used SHA-1 enabling collision-based cache poisoning that allows unsafe sandbox state reuse. Patched in 2026.2.15.
  4. CVE-2026-32973 Exec allowlist bypass via POSIX path normalization (CVSS 9.8). The matchesExecAllowlistPattern function improperly normalized patterns with lowercasing and glob matching allowing command execution outside operator intent. Patched in 2026.3.11.
  5. CVE-2026-6011 SSRF in web-fetch assertPublicHostname handler. The web-fetch tool could be manipulated to make server-side requests to internal services. Patched in 2026.1.29.
  6. CVE-2026-45006 Improper access control in gateway config operations. Compromised models could bypass an incomplete denylist to persist malicious configuration changes affecting command execution and credentials across restarts. Patched in 2026.4.23.
  7. GHSA-hr5v-j9h9-xjhg Sandbox media root bypass via unnormalized mediaUrl/fileUrl parameter keys (CWE-22). A sandboxed agent could read arbitrary files from other workspaces by using parameter keys that escape normalization. Patched in 2026.3.24.
  8. GHSA-xw77-45gv-p728 Plugin subagent routes bypassed gateway authorization with synthetic admin scopes. A remote unauthenticated request to a plugin-owned route could reach privileged subagent runtime methods. Patched in 2026.3.11.
  9. CVE-2026-45004 Arbitrary code execution via setup-api.js loaded from process.cwd (CVSS 8.4). Attackers could place a malicious file in a repository and gain execution when an operator ran commands from that directory. Patched in 2026.4.23.

Selected Research

  1. Don't Let the Claw Grip Your Hand Academic security analysis testing 47 adversarial scenarios against OpenClaw across six attack categories derived from MITRE frameworks demonstrating a native defense rate of only 17 percent.
  2. MITRE ATLAS OpenClaw Investigation MITRE Center for Threat-Informed Defense analyzed critical OpenClaw incidents identified seven new ATLAS techniques unique to the platform and mapped attack flows to TTPs and mitigations.
  3. ClawHavoc supply chain audit Koi Security audited 2857 ClawHub skills and confirmed 341 were malicious from a single campaign later growing to 824 malicious skills across an expanded registry of more than 10700 entries.
  4. BrokenClaw Part 2 sandbox escape Independent researcher demonstrated a multi-layer prompt injection that escapes the sub-agent sandbox and forces the main agent to execute attacker instructions achieving zero-click RCE.
  5. Memory-LanceDB injection chain Security issue documenting the complete persistent prompt injection chain through the LanceDB memory system via auto-capture direct insertion and unsanitized recall into context.

Vendor Documentation

  1. OpenClaw Security Documentation The vendors security guidance documents the single-operator trust model confirms sandboxing is opt-in and states that YOLO is the default host exec behavior.
  2. OpenClaw Sandboxing Documentation Documents the Docker sandbox backend tools.elevated escape hatch workspace access modes and confirms sandboxing is OFF by default with tools running on the host.
  3. Exec Approvals Documentation Documents the exec approval system including YOLO preset session-level security=full bypass progressive allowlist accumulation and the explicit escape hatch for sandboxed execution.

Other Sources

  1. What OpenClaw revealed about the agent security model SC Media analysis examining the architectural security failures that ClawHavoc exploited including the absence of mandatory skill vetting lack of runtime permission enforcement and token exfiltration through the local gateway.
  2. Tencent Zhuque Lab 50K skill scan Tencent security laboratory scanned nearly 50000 ClawHub skills and confirmed 1184 malicious entries including data-stealing trojans responsible for 247000 installations and 2.3 million dollars in cryptocurrency theft.