1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. ElevenLabs ElevenAgents exposes unfiltered voice input to the LLM, delegates tool execution to uncontrolled external endpoints, and lacks default-on output filtering or security-event monitoring.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Moderate defense controls partially offset a borderline attack surface, yielding a composite AIRQ score in the mid-range of the Humble Providers quadrant.
ElevenLabs ElevenAgents lands in the Humble Providers quadrant with an attack surface of 4.86 just below the 5.0 boundary and a blast radius of 4.63 well within the Humble Providers threshold.
The following table summarizes the composite AIRQ score and its three component axes derived from the per-surface and per-factor scoring below.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 3.65 | Moderate defense partially offsets a borderline attack surface, yielding a mid-range composite. |
| Blast Radius | 4.63 / 10 | Network and credential factors dominate; code execution and file system remain vendor-managed and inaccessible. |
| Attack Surface | 4.86 / 10 | Configuration surface carries the highest adjusted score due to a demonstrated path traversal in the vendor MCP server. [1] |
| Defense Controls | 4 / 15 | Cloud isolation and Alpha-stage guardrails provide partial coverage; output filtering and monitoring are absent. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The attack surface concentrates risk in configuration, user input, tool execution, and inter-agent communication where external MCP servers and webhook endpoints operate outside vendor-managed security boundaries.
Scores range from 0 (no documented surface) to 4 (wide open with demonstrated exploit); an evidence penalty adds up to 1.0 for confirmed vulnerabilities.
Each row scores one attack surface component on its documented default configuration, with adjusted scores reflecting any evidence-based penalty.
| Surface | Score | Comments |
|---|---|---|
| User Input | 3 / 4 | Multiple unvalidated voice channels with MCP tool responses feeding the prompt context; systematic red-teaming identifies prompt injection and audio-native attack vectors. [4][9][12][14] |
| External Data | 2 / 4 | Knowledge base ingests operator-uploaded documents and URL-scraped content into the RAG pipeline with no content validation gate documented. [13] |
| Memory | 2 / 4 | RAG-indexed knowledge base persists across conversations; conversation history stored by default with optional redaction but no integrity verification. [13] |
| Reasoning | 2 / 4 | Model-agnostic architecture delegates reasoning to external LLMs selected by the operator with no independent reasoning-chain verification documented. [8] |
| Planning | 1 / 4 | Single-turn voice conversations with no autonomous multi-step planning, delegation, or scheduling capability documented. [8] |
| Tool Execution | 3 / 4 | Webhook and MCP tools execute outbound HTTP and server-side logic on operator infrastructure without vendor-managed sandboxing. [9] |
| Orchestration | 2 / 4 | Agent transfer and batch outbound calls documented; no background task execution, daemon operation, or cron scheduling. [8] |
| Inter-Agent | 3 / 4 | External MCP servers connect without vendor-managed authentication or message integrity validation between the agent and the tool server. [9][17] |
| Output Processing | 2 / 4 | Voice output with optional conversation history redaction; no DLP or exfiltration blocking for webhook or MCP tool outputs. [8] |
| Configuration | 4 / 4 | Path traversal in the official MCP server resource handler allowed reading arbitrary host files via absolute path bypass of base_dir containment; client SDK CSP bypass weakens XSS protection. [1][2][3] |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. All three lethal trifecta conditions are triggered: untrusted callers supply input that reaches sensitive data stores and exits through unrestricted webhook and MCP egress channels.
ElevenLabs ElevenAgents exhibits all three of these conditions in its documented default configuration:
- Untrusted input — Voice callers and MCP server tool responses supply unvalidated content to the LLM reasoning loop without a default-on input filter. [9]
- Sensitive data — Knowledge bases hold sensitive business documents and conversation history captures caller PII and credential data from dynamic variables. [13]
- External egress — Webhook tools make unrestricted outbound HTTP and MCP tools transmit conversation data to external servers outside the operator trust boundary. [9][17]
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. The blast radius concentrates in network egress and credential exposure while code execution, file system access, and deployment modification remain inaccessible within the vendor-managed cloud platform.
Scores range from 0 (no access documented) to 4 (unrestricted access with demonstrated exploitation); higher scores indicate greater damage potential from a successful attack.
Each row evaluates one blast radius factor based on the documented capabilities an attacker could leverage after compromising the agent.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 1 / 4 | No arbitrary code execution; cloud-hosted vendor infrastructure with no shell or interpreter access for operators or callers. [8] |
| File system access | 1 / 4 | No file system access; knowledge base is a vendor-managed read-only store, not an operator-accessible file path. [8] |
| Network access | 3 / 4 | Unrestricted outbound HTTP via webhook tools and MCP connections to operator-specified endpoints with no vendor-side domain restriction. [9][17] |
| Credential access | 3 / 4 | Webhook tools carry API keys in request headers; MCP servers access secrets via workspace store; voice-based social engineering has demonstrated credential capture. [9][15] |
| Autonomous action | 2 / 4 | Batch outbound calls and agent transfer operate within operator-configured bounds; no fully autonomous scheduling or unattended execution documented. [8] |
| Deployment access | 1 / 4 | No infrastructure modification, deployment triggering, or CI/CD access documented; cloud-only vendor-managed platform with no operator infrastructure control. [8] |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Defense controls rely on cloud-hosted execution isolation and an Alpha-stage input guardrail, leaving output filtering and security monitoring entirely absent from the documented default configuration.
Scores range from 0 (no documented control) to 3 (fully enforced with independent verification); higher scores reduce the composite AIRQ risk.
Each row evaluates one defense control component based on its documented availability, enforcement mode, and independent verification status.
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 1 / 3 | Manipulation guardrail detects prompt injection attempts; available in Alpha status, requires explicit operator enablement; vendor safety framework recommends simulation-based red teaming. [7][12] |
| Execution Isolation | 2 / 3 | Cloud-hosted vendor-managed infrastructure isolates agent execution from operator systems with SOC 2 Type 2 compliance; MCP tools delegate to external servers outside vendor control. [8][9][11] |
| Action Controls | 1 / 3 | MCP tool approval modes are configurable per-tool with Always Ask recommended; signed URL and hostname allowlist authentication restrict access sources. [9][10] |
| Output Guardrails | 0 / 3 | Conversation history redaction detects sensitive entities but coverage is incomplete; no DLP, exfiltration blocking, or URL sanitization documented. [8] |
| Monitoring | 0 / 3 | Analytics dashboard with evaluation criteria and conversation logs; guardrail triggers logged; no SIEM integration or anomaly detection documented. [8] |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. These operator-actionable hardening tips address the five defense control gaps identified above with concrete policy, configuration, and engineering countermeasures.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Mandate Manipulation guardrail enablement for every production agent in the organizational deployment checklist and audit quarterly for coverage gaps across agent configurations. [12]
- Configuration Enable Manipulation guardrail with Hang Up exit strategy for high-risk agents and Notify for lower-risk ones to detect prompt injection without disrupting service. [12]
- Engineering Deploy an external prompt-injection classifier between the transcription layer and the LLM to catch audio-native adversarial inputs that text-only guardrails miss. [5][6][16]
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Require all MCP servers to run in isolated containers with least-privilege network policies reviewed and approved before production deployment. [9]
- Configuration Restrict webhook tool target URLs to an operator-maintained allowlist verified at the infrastructure layer rather than relying solely on LLM behavioral compliance. [9]
- Engineering Build a reverse-proxy layer between the agent platform and MCP or webhook endpoints that enforces request-level access control and payload size limits. [9]
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Require Always Ask approval mode for all newly integrated MCP servers until trust is established through documented security review and penetration testing. [9]
- Configuration Configure fine-grained per-tool approval requiring explicit confirmation for any tool accessing sensitive data stores or triggering financial transactions. [9]
- Engineering Implement a pre-execution webhook validator that inspects tool call parameters against operator-defined allowlists before the outbound request fires. [9]
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Require Zero Retention Mode for all agents handling sensitive caller data and document the enforcement policy across team deployments and compliance audits. [8]
- Configuration Enable conversation history redaction for every agent whose callers may disclose PII and configure entity detection patterns for domain-specific sensitive data. [8]
- Engineering Build an egress-filtering proxy for webhook and MCP traffic that blocks outbound requests containing credential patterns, PII markers, or sensitive business data. [9]
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Establish an incident-response runbook for guardrail trigger events that defines escalation thresholds and investigation procedures for sustained manipulation attempts. [12]
- Configuration Forward guardrail trigger events and evaluation criteria results to the organizational SIEM for correlation with broader security telemetry and alerting. [12]
- Engineering Implement custom evaluation criteria that flag conversations where tool calls access sensitive resources without matching expected caller authentication context. [8]
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- Path traversal in elevenlabs-mcp resource handler Absolute paths bypass base_dir restriction in the official ElevenLabs MCP server, allowing arbitrary file reads from the host process.
- Path traversal fix PR Applies base_dir containment check to both absolute and relative paths in the resource handler, closing the traversal bypass.
- WebRTC CSP bypass in ElevenLabs client SDK WebRTC path ignores workletPaths config, forcing Content Security Policy to allow blob and data URIs in script-src, weakening XSS protection.
Selected Research
- Aegis red-teaming framework for AI voice agents First systematic red-teaming framework for voice agents modeling authentication bypass, privacy leakage, resource abuse, privilege escalation, and data poisoning scenarios.
- WhisperInject adversarial audio attack Two-stage adversarial audio framework achieving 60-78 percent attack success rate against multimodal LLMs by embedding harmful payloads in benign-sounding audio.
- AudioJailbreak against end-to-end LALMs Universal audio jailbreak attack effective against GPT-4o-Audio and bypassing Llama-Guard-3 safeguard in weak-adversary scenarios.
- ElevenLabs safety framework for AI voice agents Vendor-published safety guidance covering system prompt extraction protection, end-call dead switch, and simulation-based red teaming approaches.
Vendor Documentation
- ElevenAgents product overview Core product documentation describing agent architecture, supported LLMs, voice configuration, knowledge base, tools, and SDK integration paths.
- MCP integration security Vendor security guidance documenting tool approval modes, prompt injection risks from external MCP servers, and operator responsibility boundaries.
- Agent authentication documentation Documents signed URL and hostname allowlist authentication mechanisms for restricting agent access to authorized sources.
- ElevenLabs Trust Center Central compliance portal hosting SOC 2 Type 2 report, technical and organizational security measures, and certification documentation.
- Guardrails 2.0 documentation Documents the Alpha-stage guardrails layer including Manipulation detection for prompt injection, Focus, Content, and Custom guardrails with configurable exit strategies.
- Knowledge base documentation Describes RAG-indexed document storage supporting PDF, TXT, DOCX, HTML, EPUB uploads and URL scraping for domain-specific agent grounding.
Other Sources
- Flanking Attack against multimodal LLMs Semi-automated voice-based prompt injection technique embedding adversarial content between benign queries to bypass content moderation filters.
- Real-world AI voice cloning vishing attack Red team case study demonstrating ElevenLabs voice cloning used to impersonate IT staff and capture MFA credentials via social engineering.
- Hidden audio attacks on voice AI pipelines Technical analysis of adversarial audio attacks targeting the transcription-to-LLM gap in voice AI pipelines, with defense patterns.
- MCP integration overview Documents MCP server connectivity via SSE and HTTP streamable transports, tool discovery, and workspace-level configuration.