Hermes Agent Security Risks

Computer Agents hermes-agent.nousresearch.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (3) ATTACK SURFACE (8.1) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
5.17
High
Attack Surface
8.1
Critical
Blast Radius
8
Critical
Defense Controls
3
Critical
About The Agent

Hermes Agent is a self-hosted autonomous agent that runs as a persistent process on operator-owned hardware with operator-level shell, file system, and network access. The same single-operator runtime accepts inbound prompts from a wide range of messaging clients and webhook integrations, auto-loads persistent cross-session memory at startup, executes community-published skills from the Skills Hub marketplace, and schedules unattended cron tasks that inherit the full operator context. Every input channel feeds the same prompt context with the same execution authority, and the documented default ships without execution isolation or output filtering.

About the AI Risk Quadrant

Exposed Giants agents combine a broad attack surface with high blast radius and minimal defense controls. Hermes Agent earns this placement through unrestricted local shell execution, plaintext credential storage accessible to sandboxed scripts, an approval system defeatable by a single configuration flag, and a Skills Hub marketplace where community skills bypass the static security scanner entirely. The messaging gateway bridges external input to full agent execution without content filtering, while the default runtime provides no execution isolation, no output guardrails, and no real-time monitoring, leaving operators to layer every meaningful control on their own.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Hermes Agent's default configuration exposes the operator's host credentials, file system, and network to attacker-controlled input from unscanned skill files, unauthenticated webhooks, and an unrestricted local shell backend.

Key Input Risks
Unscanned skill description files inject directly into the system prompt [8]. Webhook payloads feed attacker-controlled fields into the agent prompt [9], and the Skills Guard scanner falls to dynamic imports [6]. Input Guardrails cover only direct operator prompts.
Key Execution Risks
The local terminal backend ships as the default and runs with operator-level privileges. An authentication flaw in the API server (CVE-2026-7112, CVSS 5.6) permits unauthenticated prompt-to-RCE when the bind address shifts to a non-localhost interface [1]. PYTHONPATH injection gives sandboxed scripts access to API keys and OAuth tokens [5].
Key Action Risks
The built-in cron scheduler fires unattended tasks with full capabilities, and the approval gate falls to a single YOLO flag [14]. Credential access reaches plaintext API keys and OAuth tokens exposed through PYTHONPATH injection [5]. The write-denied path list omits control-plane files [10].
Key Output Risks
PYTHONPATH injection lets sandboxed scripts steal API keys via encoded sandbox output that evades the redaction layer [5]. No output filtering or DLP operates at the agent boundary by default. CVE-2026-7396 (CVSS 5.3) demonstrated path traversal reading files outside the intended directory [3].
Key Monitoring Risks
Structured logging is documented but centralized forwarding, anomaly detection, and real-time alerting are absent at default [14]. Credential file reads and configuration changes produce no tamper-evident audit trail, leaving silent control-plane modifications undetectable [10].

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Hermes Agent's scores reflect a broad, evidence-anchored attack surface paired with high blast radius on the operator's local host and minimal vendor-provided default controls.

AIRQ Metrics

Hermes Agent falls into the Exposed Giants quadrant because the combination of unrestricted local shell execution, unscanned external input channels, and plaintext credential storage produces high exposure across both the attack surface and blast radius axes while vendor-provided default controls remain minimal. There is no single architectural chokepoint a defender can rely on: the same operator-level runtime drives shell, file system, and network tools, accepts inbound prompts from messaging clients and webhooks, and executes community skills from the marketplace, so every integration shares the same prompt context and execution authority.

All four metrics land in the Critical range, grounded in demonstrable rather than theoretical patterns: NVD-listed authentication bypasses in the API server and webhook adapter, PYTHONPATH injection exfiltrating credentials from the code sandbox, a Skills Guard bypass achieving zero findings on a malicious skill, and path traversal escaping the quarantine boundary. Vendor security documentation and the vendor trust model anchor the architectural base bands, while GitHub security issues and independent research provide the agent-specific penalty evidence.

Metric Score Comments
AIRQ Score 5.17 Evidence base spans agent-specific NVD CVEs covering authentication bypasses and file operations, independently reproduced GitHub security issues demonstrating prompt injection, credential exfiltration, and security scanner bypass, vendor security documentation describing the defense model and trust architecture, and third-party research on tool registry poisoning and memory poisoning vectors. Every attack surface is anchored on primary sources; defense posture is grounded in vendor-published defaults confirmed absent by the documented single-tenant trust model, where the security boundary protects the operator from LLM actions rather than from external adversaries.
Blast Radius 8 / 10 The default local terminal backend executes commands with operator-level privileges on the host, with full outbound network connectivity and no SSRF mitigation or egress filtering. Credential exposure spans plaintext API keys and OAuth tokens in configuration files, demonstrably reachable through PYTHONPATH injection in the code sandbox. The cron scheduler fires unattended tasks that inherit the full operator context including credentials and persistent memory state. Only the deployment access factor scores below the architectural midpoint, reflecting the absence of dedicated deployment tools for cloud infrastructure or production environments.
Attack Surface 8.1 / 10 Agent-specific NVD CVEs and independently reproduced GitHub security issues push multiple surfaces to the adjusted ceiling, with authentication bypass in the API server and webhook adapter, Skills Guard defeat via dynamic imports, and PYTHONPATH injection exfiltrating credentials through the sandbox output path. The vendor-documented architecture provides a model-agnostic reasoning loop, persistent cross-session memory, cron-scheduled automation, and a community skill marketplace that all feed the same prompt context with the same execution authority. All three trifecta dimensions are triggered on the documented default, confirming full-chain exfiltration viability where untrusted input reaches credentials and exits through unrestricted outbound channels.
Defense Controls 3 / 15 The vendor describes a defense model with seven named layers — command approval, container options, MCP credential filtering, context scanning, cross-session isolation, and input sanitization — but the default configuration ships with the local terminal backend providing no execution isolation and no output guardrails. The command approval system is the only vendor-implemented action control, and it falls to a single YOLO mode flag. Monitoring is limited to structured logging without SIEM forwarding or anomaly detection. Independent security research confirms bypass paths through the Skills Guard scanner and the credential isolation boundary.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The dominant exposures are unfiltered multi-channel input ingestion across messaging clients and webhooks, persistent memory injection, a community skill marketplace, and a model-agnostic reasoning loop with unrestricted tool authority.

Attack Surface Metrics

Higher scores reflect surfaces where attacker-controlled input reaches the reasoning loop with minimal filtering and demonstrated exploitation paths anchor the adjusted ceiling.

Each row ties a named surface to its base architectural band, agent-specific evidence, and a comment describing the entry point and documented exploitation path.

Surface Score Comments
User Input 5 / 4 Skill description files evade all prompt injection scanning, with their contents injected word-for-word into the system prompt and enabling persistent instruction override that survives session boundaries [8]. The messaging gateway accepts multi-platform input without content filtering [14]. Independent research documents tool registry poisoning vectors where injected function call descriptions redirect agent behavior through the user input channel [12].
External Data 5 / 4 Webhook routes forward attacker-controlled payload fields directly into the agent prompt for execution even when HMAC validation is present on the route [9]. The SMS gateway accepts webhook callbacks without validating the Twilio signature header, enabling forged requests to drive full agent execution [7]. The agent fetches and processes external web content, MCP tool responses, and marketplace skill packages without pre-ingestion validation [16].
Memory 3 / 4 The agent maintains persistent cross-session memory using an FTS5-indexed database that survives restarts and is injected as frozen snapshots into the system prompt [18]. Automated skill codification from successful interactions creates a second persistence vector alongside explicit memory entries. Independent analysis identifies memory poisoning vectors where persistent entries amplify compromised instructions across sessions [13].
Reasoning 3 / 4 The agent delegates reasoning to interchangeable external LLM providers through a model-agnostic architecture, meaning prompt injection payloads in any input channel reach whichever model the operator has configured without per-provider hardening [14]. No output validation differentiates the reasoning path from the tool-execution path, and the system prompt carries all input context including attacker-controllable memory and skill descriptions.
Planning 3 / 4 The agent autonomously schedules tasks via a built-in cron system, delegates to spawned subagents, and chains execution contexts across scheduled runs [14]. The approval gate for autonomous actions can be bypassed via a single YOLO mode flag, removing the operator from the planning loop entirely [15].
Tool Execution 5 / 4 The default local terminal backend executes shell commands with operator-level privileges [14]. CVE-2026-7112 (CVSS 5.6) enables unauthenticated prompt-to-RCE when the API server bind address is changed to a non-localhost interface [1]. CVE-2026-7113 (CVSS 5.6) enables unauthenticated full agent execution through the webhook adapter when INSECURE_NO_AUTH is configured as the route secret [2]. The agent ships with a wide range of built-in tools including file operations, shell execution, browser automation, and web fetching [16].
Orchestration 3 / 4 The agent spawns subagents, schedules cron jobs for unattended execution, and chains execution contexts across runs without scoping restrictions on the delegated context [14]. Subagent delegation passes the full operator context including credentials and memory state. Independent research maps tool registry poisoning and function-call injection vectors in the orchestration layer [12].
Inter-Agent 3 / 4 The agent integrates with external tools and services through MCP without inter-agent authentication, and the Skills Hub marketplace distributes community-authored skill packages that execute within the agent's full privilege scope [14]. No trust boundary separates marketplace-installed skills from vendor-provided built-in tools. The open-source repository confirms the absence of skill-level sandboxing or permission scoping [17].
Output Processing 4 / 4 The code execution sandbox injects the project root into PYTHONPATH, letting sandboxed scripts import internal modules and exfiltrate API keys through encoded output that passes unfiltered through the agent boundary [5]. The sandbox output path does not sanitize structured data before returning it to the agent context, enabling credential theft through side channels.
Configuration 5 / 4 The Skills Guard security scanner is fully bypassed by dynamic imports and string construction, achieving zero findings on an exfiltrating community skill [6]. Path traversal in the credential_files and skills_hub quarantine directories allows reads and writes outside the sandbox boundary before the security scan executes [11]. The write-denied path list omits Hermes control-plane files, letting a compromised prompt silently modify auth.json and config.yaml [10].

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Hermes Agent exhibits all three on the documented default: an operator's API keys, OAuth tokens, and persistent session data are one injected instruction away from exfiltration through the messaging gateway or local shell, crossing no vendor-provided control.

Lethal Trifecta · Complete (3 of 3)

Hermes Agent exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Unscanned skill description files [8] and webhook payloads [9] feed attacker-controlled text directly into the reasoning loop on the documented default configuration.
  • Sensitive data — PYTHONPATH injection exposes plaintext API keys and OAuth tokens in hermes_cli.config and auth.json by allowing sandboxed scripts to import internal configuration modules [5].
  • External egress — Unrestricted outbound network on the local backend and the messaging gateway's external chat and webhook channels provide multiple exfiltration paths with no egress filtering [14].

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A successful compromise of Hermes Agent reaches the operator's local host with unrestricted shell, file system, and network access, plus plaintext credentials and unattended cron automation.

Blast Radius Metrics

Higher scores reflect factors where the agent's default configuration provides direct, unrestricted access to the operator's host resources with demonstrated exploitation paths.

Each row ties a blast factor to the scope of access the agent holds by default and the evidence anchoring the capability assessment.

Factor Score Comments
Code execution 3 / 4 Commands execute with the operator's full privilege set on the local terminal backend, with no separation between agent-initiated and operator shell sessions [14]. Built-in tools include direct shell execution, Python script evaluation, and browser automation, all inheriting the operator's file system and network permissions [16].
File system access 3 / 4 Symlink following in file_tools.py bypasses the sensitive-path check mechanism, allowing writes outside the allowed boundary; patched in 0.9.0 [4]. CVE-2026-7396 demonstrates path traversal in the WeChat Work platform adapter reading files outside the intended directory [3]. The built-in file tools provide read, write, and directory traversal capabilities across the operator's file system with an incomplete deny list.
Network access 4 / 4 The default local backend provides unrestricted outbound network access with no SSRF protection or egress filtering [14]. The agent fetches arbitrary URLs, makes API calls, and sends messages through the messaging gateway to external services without network policy restrictions. The vendor trust model confirms that the security boundary addresses LLM actions on the local host, not outbound network access [15].
Credential access 4 / 4 PYTHONPATH injection in the code sandbox exposes hermes_cli.config API keys and auth.json OAuth tokens by allowing sandboxed code to import internal modules [5]. The credential management system stores secrets in plaintext configuration files accessible to agent-initiated processes. Hermes control-plane files are absent from the write-denied list, so a compromised prompt can modify auth.json and redirect credential flows [10].
Autonomous action 3 / 4 The built-in cron scheduler fires unattended task execution with full agent capabilities, and the optional approval gate can be bypassed via the YOLO mode flag allowing autonomous actions without operator confirmation [14]. Scheduled tasks receive the complete operator context with credentials and persistent memory state [15].
Deployment access 2 / 4 The agent does not ship dedicated deployment tools for cloud infrastructure, IaC, or production environment management [16]. Deployment access is limited to what the operator's local shell environment provides and is not agent-mediated. The vendor documents the agent as a personal productivity tool rather than a deployment automation platform [19].

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. The vendor documents a seven-layer defense model, but the default configuration ships with no execution isolation, no output guardrails, and a single-step approval bypass.

Defense Controls Metrics

Higher scores indicate stronger vendor-provided controls at default; the inverted coloring highlights that most controls for this agent are absent or operator-managed.

Each row evaluates a defense component against what the vendor implements at default versus what falls to the operator to configure or add.

Component Score Comments
Input Guardrails 1 / 3 The vendor documents context file scanning and input sanitization as part of a broader defense model, but the approval gate covers only direct operator commands [14]. Prompt injection scanning is absent for community skill description files, which enter the system prompt without content filtering [8]. Webhook payloads and MCP tool responses also reach the reasoning loop unfiltered.
Execution Isolation 0 / 3 The default local terminal backend runs with operator-level privileges and no sandbox or container isolation [14]. Docker and container backends are configurable alternatives but not the default. The code sandbox's PYTHONPATH injection demonstrates that even the available isolation boundary is incomplete, exposing internal modules and configuration credentials to sandboxed scripts [5].
Action Controls 1 / 3 The vendor documents a command approval system that requires operator confirmation before executing sensitive actions, but the YOLO mode flag provides a single-step bypass that removes approval entirely [14]. Skills Guard falls completely to dynamic import techniques, producing zero detections on a skill designed to exfiltrate data [6].
Output Guardrails 0 / 3 No output filtering, DLP, or redaction mechanism operates at the agent boundary by default [14]. The code sandbox output path does not sanitize structured data, enabling credential exfiltration through encoded output [5]. The vendor trust model focuses on protecting the operator from LLM actions rather than on outbound data controls [15].
Monitoring 1 / 3 No centralized log forwarding, anomaly detection, or alerting infrastructure ships in the default configuration [16]. The vendor documents structured logging for individual agent actions and tool invocations, but the open-source repository confirms no tamper-evident audit trail or monitoring integration is present by default [17].

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize execution isolation and input filtering to break the full-chain exfiltration path from untrusted input through credential access to unrestricted egress.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require manual review and approval before installing any community skill to block unscanned prompt injection payloads from entering the system prompt — counters skill description injection [8].
  • Configuration Enforce cryptographically random webhook route secrets and validate Twilio signature headers on all inbound SMS callbacks — counters unauthenticated webhook execution [7].
  • Engineering Wire a prompt injection classifier between the messaging gateway and the reasoning loop to filter attacker-controlled payloads before they reach the model — counters external payload injection [9].

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Mandate containerized terminal backends for all production deployments to establish a filesystem and network boundary around agent-initiated commands — counters default operator-level privilege exposure [14]; raises execution isolation from 0 toward 2.
  • Configuration Switch the terminal backend from local to Docker with a read-only root filesystem and restricted network namespace — counters unrestricted host shell access [15].
  • Engineering Isolate the code sandbox Python environment from the host PYTHONPATH to prevent internal module imports that expose configuration credentials — counters sandbox credential exfiltration [5].

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Disable YOLO mode in production and require operator approval for all state-modifying tool invocations — counters single-flag approval bypass [15].
  • Configuration Restrict the enabled tool set to a deny-by-default allowlist matching the deployment's specific use case — counters the broad built-in tool attack surface [16].
  • Engineering Add Hermes control-plane files to the write-denied path list to prevent compromised prompts from silently modifying auth.json and config.yaml — counters control-plane file write [10].

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Require all agent output to pass through a DLP scanner before reaching external channels to detect and redact credentials and PII — counters encoded output exfiltration [5].
  • Configuration Configure the code sandbox to sanitize structured return values before they re-enter the agent context — counters side-channel credential theft through encoded output [14].
  • Engineering Implement content inspection on the messaging gateway output path to block credential patterns and sensitive data from webhook and chat responses — counters unfiltered egress channels [14].

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Require centralized SIEM forwarding for all agent action logs with alerting rules for anomalous tool invocations and credential access patterns — counters absent real-time detection [14].
  • Configuration Enable tamper-evident audit logging for all credential file reads and configuration changes with immutable storage — counters silent control-plane modification [10].
  • Engineering Instrument the messaging gateway to log all outbound messages with recipient and payload metadata for exfiltration detection — counters unmonitored egress channels [15]; expected to improve monitoring from 1 toward 2.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2026-7112 Improper authentication in API server when API_SERVER_KEY unset and bind changed to non-localhost (CVSS 5.6)
  2. CVE-2026-7113 Missing authentication in webhook adapter with INSECURE_NO_AUTH route secret enables unauthenticated agent execution (CVSS 5.6)
  3. CVE-2026-7396 Path traversal in WeChat Work platform adapter reads files outside intended directory (CVSS 5.3)
  4. CVE-2026-7397 Symlink following in file_tools.py bypasses sensitive-path checks; patched in 0.9.0 (CVSS 4.4)
  5. PYTHONPATH injection in code sandbox Sandbox inherits the project root in PYTHONPATH, enabling sandboxed code to reach internal configuration modules and extract API keys
  6. Skills Guard bypass via dynamic import Regex-based security scanner fully bypassed by dynamic imports achieving zero findings on exfiltrating skill
  7. Unauthenticated RCE via SMS webhook SMS gateway HTTP server accepts webhook callbacks without Twilio signature validation
  8. Prompt injection via unscanned DESCRIPTION.md Skill description files evade prompt injection scanning and are injected word-for-word into the system prompt
  9. Webhook payload prompt injection Attacker-controlled business fields in webhook payloads rendered directly into agent prompt for execution
  10. Control-plane file write via file tools Write-denied path list omits Hermes control-plane files letting compromised prompt modify auth.json and config.yaml
  11. Path traversal in skills_hub quarantine Unsanitized path joins in credential_files and quarantine_bundle allow reads and writes outside sandbox boundary

Selected Research

  1. Hermes Agent tool registry and function-call injection analysis Ship Safe published 17 Hermes-specific detection rules covering tool registry poisoning and function-call injection
  2. Securing Hermes Agent against memory poisoning Independent analysis of Hermes Agent memory poisoning vectors referencing OWASP ASI06

Vendor Documentation

  1. Hermes Agent security documentation Vendor documents seven-layer defense model covering command approval, container isolation, MCP credential filtering, and input sanitization
  2. Hermes Agent SECURITY.md trust model Vendor states Hermes is single-tenant; security model protects operator from LLM actions; default execution is local host
  3. Hermes Agent documentation hub Comprehensive documentation covering 70-plus built-in tools, MCP integration, memory system, skills system, and cron scheduling

Other Sources

  1. NousResearch/hermes-agent GitHub repository Open-source repository under MIT license with over 160K stars enabling direct code inspection of every security boundary
  2. Hermes Agent memory system documentation Vendor documents persistent memory architecture including FTS5 search and frozen snapshot injection into system prompt
  3. Hermes Agent vendor landing page Landing page describes Hermes as self-hosted autonomous agent with persistent memory, scheduled automations, and subagent delegation