NVIDIA NemoClaw Agent Security Risks

Computer Agents nvidia.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (4) ATTACK SURFACE (8.18) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.32
High
Attack Surface
8.18
Critical
Blast Radius
5.88
High
Defense Controls
4
High
About The Agent

NVIDIA NemoClaw is an open-source reference stack that wraps OpenClaw autonomous agents inside an OpenShell sandbox with deny-by-default network egress and Landlock filesystem isolation. Deployed self-hosted on local workstations including RTX PCs and DGX hardware, the agent runs as an always-on daemon with full bash shell access, ingesting content from CLI, Telegram, GitHub, npm, and marketplace skills. The primary risk surface is the combination of unvalidated multi-channel input, autonomous tool execution with no per-command approval, and default-allowed egress channels that have been demonstrated as exfiltration paths.

About the AI Risk Quadrant

Exposed Giants placement reflects high Attack Surface driven by two patched CVEs, multiple demonstrated exfiltration chains, and six surfaces at or near adjusted ceiling, paired with moderate Blast Radius constrained by the sandbox boundary and minimal Defense Controls appropriate to alpha-stage software. Operators deploying NVIDIA NemoClaw should prioritize closing the input guardrails gap and adding output filtering before granting access to sensitive data. The sandbox provides meaningful execution isolation, but demonstrated bypasses through policy-approved channels show that network-level controls alone do not prevent data exfiltration from a compromised agent.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. NemoClaw's default configuration exposes broad input channels with no input or output guardrails, relying on execution isolation as the primary safeguard for an always-on autonomous agent.

Key Input Risks
The agent ingests unvalidated content from CLI, Telegram, code repositories, package registries, and community marketplace skills with no prompt shield or injection detection. CVE-2026-24222 confirmed that prompt-injected content reaches the reasoning loop and drives host-level data access.
Key Execution Risks
Full bash shell with npm, python, and node runs inside a Landlock-isolated sandbox with no per-command approval and no custom syscall filter. Independent security research demonstrated data exfiltration through policy-approved channels without triggering any configured sandbox violation.
Key Action Risks
Inside the approved network scope the agent executes all commands autonomously with no per-action approval gate and cron scheduling for persistent background tasks. Operator approval covers network destinations only, leaving shell execution, file writes, and package installations ungated.
Key Output Risks
The agent emits text, files, and integration messages through GitHub, Telegram, and other approved channels with no DLP or credential redaction. Independent research demonstrated emoji-encoded credential exfiltration via pull requests that bypassed the inline detection mechanism.
Key Monitoring Risks
The agent writes file-based logs with no SIEM integration, no active alerting, and no anomaly detection on the default configuration. Cron job creation, bulk data access, and credential reads inside the sandbox are silent unless the operator manually inspects logs.

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. NemoClaw's composite reflects high attack exposure with demonstrated exploits offset by limited defense controls on alpha-stage sandboxing infrastructure.

AIRQ Metrics

NemoClaw places in the Exposed Giants quadrant where broad attack exposure pairs with moderate blast radius and minimal defense coverage on alpha-stage software.

Each axis measures a distinct facet of risk: Attack Surface and Blast Radius each out of 10, Defense Controls out of 15, and the composite score out of 15.

Metric Score Comments
AIRQ Score 4.32 Low composite indicates the operator should prioritize closing the input and output guardrails gaps before deploying in sensitive environments.
Blast Radius 5.88 / 10 Code execution, credentials, and autonomous action are the dominant blast factors; sandbox containment limits file system and network exposure.
Attack Surface 8.18 / 10 Six surfaces reach adjusted ceiling driven by two CVEs and multiple demonstrated exfiltration chains across the attack chain.
Defense Controls 4 / 15 No input or output filtering ships by default; execution isolation via Landlock is the only vendor-implemented control above baseline.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. NemoClaw's reasoning loop ingests content from CLI, messaging, code repositories, package managers, and marketplace skills with no validation or filtering gate on the default configuration.

Attack Surface Metrics

Higher scores indicate wider input acceptance and deeper tool access; surfaces involving external data ingestion and code execution score highest for this agent.

Each row names an attack surface, its scored exposure level, and a comment grounding the score in documented defaults and available agent-specific evidence.

Surface Score Comments
User Input 5 / 4 Multiple unvalidated input channels including CLI, Telegram, and web interface accept untrusted content with no injection detection; CVE-2026-24222 confirmed prompt injection reaches the reasoning loop [1][3].
External Data 5 / 4 Auto-ingests untrusted content from GitHub repositories, npm packages with executable postinstall scripts, and ClawhHub marketplace skills without content validation [5].
Memory 4 / 4 Persistent cross-session memory via SOUL.md, MEMORY.md, and USER.md with automated writes and no integrity verification; independent research demonstrated persistent configuration poisoning [4].
Reasoning 3 / 4 Model-agnostic multi-step reasoning delegates to interchangeable external LLMs with partial transparency via agent logs but no alignment verification [8].
Planning 3 / 4 Autonomous task decomposition with cron scheduling enables persistent background tasks and blueprint lifecycle management without per-step operator approval [8].
Tool Execution 5 / 4 Full bash shell inside sandbox with npm, python, node, git, and curl; independent research demonstrated malicious package installation plus autonomous cron job creation [5].
Orchestration 4 / 4 Always-on daemon with cron scheduling and headless operation; independent research demonstrated persistent cron-based exfiltration running continuously from the daemon process [6].
Inter-Agent 1 / 4 Single-agent architecture with no inter-agent communication; ClawhHub marketplace skills installed via default network policy create a supply-chain surface [10].
Output Processing 4 / 4 Rich output with basic inline detection but no exfiltration blocking; independent research demonstrated emoji-encoded credential exfiltration via GitHub pull requests bypassing detection [4].
Configuration 5 / 4 Auto-loads blueprints and network policies from project directories; CVE-2026-24231 confirmed SSRF via configuration endpoint bypass, and network policy presets lack binary restrictions [2][11].

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. NemoClaw ingests untrusted content from messaging and package channels, holds plaintext credentials and project files inside the sandbox, and has default-allowed outbound channels to GitHub and npm.

Lethal Trifecta · Complete (3 of 3)

NVIDIA NemoClaw exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Telegram messages, GitHub repositories, npm packages, and ClawhHub marketplace skills carry content authored by parties other than the operator with no input filtering [1].
  • Sensitive data — OpenClaw credentials at /sandbox/.openclaw/openclaw.json, user-configured API keys, and project files inside /sandbox are readable by the agent at runtime [3].
  • External egress — Default-allowed outbound channels to github.com, clawhub.ai, and registry.npmjs.org provide exfiltration paths demonstrated through GitHub pull requests [4][10].

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A compromised NemoClaw agent can execute arbitrary code, access credentials, and run persistent background tasks within the sandbox, constrained by Landlock filesystem isolation and deny-by-default egress.

Blast Radius Metrics

Higher blast scores indicate wider reach from a compromised agent; code execution, credentials, and autonomous action score highest due to the unrestricted shell and always-on daemon.

Each row maps a blast radius factor to its score and the documented scope of damage a compromised agent can reach on the default configuration.

Factor Score Comments
Code execution 3 / 4 Bash shell providing npm, python, node, git, and curl runs inside a Landlock-isolated sandbox as a non-root user within a k3s pod [9].
File system access 2 / 4 Read-write access scoped to /sandbox and /tmp with system paths enforced read-only via Landlock LSM filesystem policy [9].
Network access 2 / 4 Deny-by-default egress with operator-approved domain allowlist via OpenShell gateway; default policy allows github.com, clawhub.ai, and npm registry [10].
Credential access 3 / 4 OpenClaw credentials stored in plaintext inside the sandbox; CVE-2026-24222 previously allowed reading host environment variables from within the sandbox boundary [1].
Autonomous action 3 / 4 Always-on daemon with cron scheduling enables persistent autonomous tasks; operator approval covers network destinations only, not individual agent commands [8].
Deployment access 1 / 4 No dedicated deployment tooling exists; generic shell access could invoke deployment commands but no first-class infrastructure integration is documented [8].

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. NemoClaw ships execution isolation via Landlock and network controls via OpenShell as its primary defenses; input filtering, output guardrails, and structured monitoring are absent on the default configuration.

Defense Controls Metrics

Higher scores indicate stronger vendor-implemented safeguards; only execution isolation and action controls score above zero, reflecting the alpha-stage maturity of this product.

Each component is scored on the vendor-implemented default with confidence reflecting the evidence tier: independently verified, vendor documented, or architecturally inferred.

Component Score Comments
Input Guardrails 0 / 3 No prompt shield or injection detection ships by default; the vendor security documentation acknowledges the need but omits input-level filtering from the shipped configuration [7].
Execution Isolation 2 / 3 Landlock LSM, seccomp defaults, capability drops, and deny-by-default egress form the isolation boundary; capped at vendor-documented tier for alpha-stage software with no custom seccomp profile [9][12].
Action Controls 1 / 3 Deny-by-default network endpoint approval via operator TUI provides partial coverage; tool execution, file writes, and cron job creation inside the sandbox require no approval [10].
Output Guardrails 0 / 3 No DLP, credential redaction, or exfiltration blocking exists by default; independent research demonstrated emoji-encoded exfiltration through policy-approved channels [4].
Monitoring 1 / 3 File-based logging via nemoclaw CLI and OpenShell egress logs without SIEM forwarding, active alerts, or behavioral anomaly detection on the default configuration [7].

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize adding input filtering and output guardrails to break the untrusted-input-to-sensitive-data-to-external-egress chain before deploying in sensitive environments.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require all external inputs to pass through an approved prompt injection classifier before reaching the agent sandbox, rejecting any input that fails classification.
  • Configuration Configure a pre-processing webhook in the OpenShell gateway to filter known injection patterns and block inputs containing encoded exfiltration payloads.
  • Engineering Integrate the community prompt injection scanner module from the open pull request into the default blueprint and enable it for all input channels.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Mandate kernel version 5.13 or later for all deployments to guarantee Landlock LSM enforcement is active and not silently falling back to the weaker container-default isolation.
  • Configuration Apply a custom seccomp profile restricting syscalls to the minimum set required by the agent runtime, replacing the container default.
  • Engineering Implement read-only filesystem enforcement for /sandbox via Landlock to prevent persistent backdoor writes including cron job and configuration tampering.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Require operator re-approval for previously approved network endpoints on a fixed expiration schedule rather than session-scoped persistence.
  • Configuration Restrict the default network policy to remove blanket access to github.com and registry.npmjs.org unless explicitly required by the current task.
  • Engineering Build a per-command approval gateway inside the sandbox that gates destructive operations including package installation, cron job creation, and git push, countering the demonstrated autonomous cron and npm install exfiltration vectors.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Require all outbound messages and pull request bodies to pass through a DLP scanner before leaving the sandbox boundary.
  • Configuration Configure the OpenShell gateway to strip or reject encoded payloads exceeding a configurable entropy threshold in all egress traffic.
  • Engineering Implement credential redaction in the OpenShell L7 proxy that detects and masks API keys, tokens, and secrets across all outbound channels.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Require all deployments to forward OpenShell egress logs to a centralized SIEM for correlation, alerting, and incident response workflows.
  • Configuration Enable structured JSON logging for all agent actions with configurable retention policies and real-time forwarding to the monitoring infrastructure.
  • Engineering Deploy anomaly detection on agent behavior patterns to flag unusual cron job creation, bulk data access, or unexpected egress destination requests.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2026-24222 NVD CVE 8.6 HIGH; prompt-injection-driven host env var exfiltration; patched v0.0.18
  2. CVE-2026-24231 NVD CVE 6.3 MEDIUM; SSRF via validateEndpointUrl bypass; patched v0.0.13
  3. NVIDIA Security Bulletin April 2026 Vendor advisory covering CVE-2026-24222 and CVE-2026-24231

Selected Research

  1. Lasso Security sandbox exfiltration research Demonstrated emoji-encoded credential theft and persistent Agent Configuration Poisoning against NemoClaw
  2. eSecurity Planet NemoClaw exfiltration coverage Independent coverage confirming attacks succeeded without violating configured sandbox policies
  3. TechTalks OpenClaw sandbox vulnerability analysis Independent coverage of emoji-encoding bypass and SOUL.md poisoning against NemoClaw

Vendor Documentation

  1. NemoClaw Security Best Practices Vendor security doc on deny-by-default egress, Landlock enforcement, and provider trust tiers
  2. NemoClaw Architecture Vendor doc on OpenShell gateway, k3s sandbox, credential proxy, and filesystem layout
  3. NemoClaw Sandbox Hardening Vendor doc on Landlock filesystem policy, capability drops, and kernel requirements
  4. NemoClaw Network Policies Vendor doc on deny-by-default policy, default-allowed endpoints, and operator approval flow

Other Sources

  1. GitHub #272 missing binaries restriction Open security issue: network policy presets do not restrict which binaries reach allowed endpoints
  2. GitHub #803 no custom seccomp profile Open issue: sandbox relies on container runtime default seccomp instead of NemoClaw-specific allowlist