1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Vapi's dominant risk concentration is the absence of default input guardrails on an untrusted voice channel combined with unrestricted tool execution and outbound network egress from the code sandbox.

Key Input Risks

Untrusted caller speech reaches the LLM reasoning loop without input filtering on the documented default configuration. The vendor's own prompting guide warns that prompt-level controls are insufficient and can be jailbroken, directing operators to enforce security-sensitive constraints on the server side. [1][2]

Key Execution Risks

The sandboxed code-execution environment restricts file system access and limits network to outbound HTTP, but custom function tools execute on the operator's own infrastructure without platform-level containment. No default approval gate stands between a tool call decision and its execution. [3][10][11]

Key Action Risks

Tool calls, outbound phone calls, and call transfers proceed without per-action operator approval on the default configuration. The tool rejection plan feature exists but is opt-in, leaving the LLM as the sole arbiter of when tools fire. [4][12]

Key Output Risks

Voice output transmits agent responses to the caller in real time without documented credential redaction or data-loss prevention in the default configuration. Environment variables holding API keys pass through Code Tool outputs into LLM context and may surface in spoken responses. [10]

Key Monitoring Risks

Structured call logging, transcription, and automated analysis ship as defaults, and the platform holds SOC 2 Type II certification. No behavioral anomaly detection, automated security alerting, or SIEM forwarding is documented, leaving incident detection to operator-managed tooling. [6][14]

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Vapi lands in the lower-risk quadrant with a contained blast radius and minimal defense controls on vendor-documented defaults, and the trifecta floor is the primary driver of the attack surface score.

AIRQ Metrics

AIRQ Score1.71

Blast Radius2.63

Attack Surface4.8

Defense Controls2

The combination of a moderate attack surface pushed above the raw value by the trifecta convergence and a structurally low blast radius from the absence of shell, file system, and deployment primitives places Vapi in the lightweight quadrant with a short hardening path.

Attack Surface and Blast Radius are each scored on a ten-point scale, Defense Controls on a fifteen-point scale, and the composite AIRQ Score weights defense as a multiplier on capability per unit of risk. A lower AIRQ Score indicates that the agent's defense controls and limited blast radius offset a larger share of the attack surface exposure.

Metric	Score	Comments
AIRQ Score	1.71	Composite reflects a lightweight agent whose contained blast radius and partial defense offset a trifecta-elevated attack surface.
Blast Radius	2.63 / 10	Sandboxed code execution with no file system, no shell, and no deployment access confines the blast to outbound HTTP and credential environment variables.
Attack Surface	4.8 / 10	Trifecta-complete convergence of untrusted input, sensitive data, and outbound egress applies the floor, elevating the raw score above what individual surface bands alone would produce.
Defense Controls	2 / 15	Vendor documents a sandboxed code runtime and structured call logging as defaults; all other controls including input guardrails, action gates, and output filtering are opt-in.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The dominant exposure for Vapi is the unfiltered voice channel that feeds caller speech directly into the LLM reasoning loop, with the trifecta convergence elevating the composite above the raw per-surface scores.

Attack Surface Metrics

User Input3

Tool Execution2

External Data2

Orchestration2

Memory1

Inter-Agent2

Reasoning2

Output Processing1

Planning1

Configuration1

One surface reaches the highest band among the ten surfaces and eight sit in the middle bands, with only memory and three others at the floor, reflecting a voice-first agent whose input channel is its largest exposure.

Each row maps a scored entry point to the architectural exposure it creates and the evidence anchoring the band assignment.

Surface	Score	Comments
User Input	3 / 4	Caller speech over phone and WebRTC reaches the LLM without default input filtering; the vendor acknowledges the prompt boundary can be bypassed. [1]
External Data	2 / 4	Knowledge base files and custom retrieval server outputs feed into the reasoning loop from operator-configured sources without documented content validation. [17]
Memory	1 / 4	Session-scoped conversation context only; no cross-session persistence or automated memory writes are documented. [7]
Reasoning	2 / 4	Multi-step reasoning operates within the declared task scope with full transcript visibility, though the model-agnostic architecture delegates the reasoning chain to the operator-selected LLM. [20]
Planning	1 / 4	Agents follow developer-defined workflows and handoff rules rather than autonomously decomposing tasks at runtime. [20]
Tool Execution	2 / 4	Code Tool runs sandboxed TypeScript with no file system access, limited memory, and outbound HTTP only; custom function tools execute on operator-hosted endpoints. [10][11]
Orchestration	2 / 4	Squads orchestrate multi-assistant handoffs within a single call session; scheduled outbound calls require explicit API invocation by the operator. [13]
Inter-Agent	2 / 4	Squad members share context through handoff tools with configurable context engineering; no open marketplace or external agent ecosystem connects by default. [13]
Output Processing	1 / 4	Voice-only output limits exfiltration to the audio channel; no rich rendering, URL embedding, or markdown image side channels are present. [14]
Configuration	1 / 4	Configuration is operator-controlled through the API and dashboard; no auto-loaded project files, community plugin marketplace, or config-based security bypass is documented. [6]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Vapi exhibits all three on the documented default: a caller's speech can reach the LLM that holds conversation transcripts and operator-configured API keys stored in the code execution environment, and the same session's Code Tool can transmit bytes outbound via unrestricted HTTP without crossing any platform-level control.

Lethal Trifecta · Complete (3 of 3)

Vapi exhibits all three of these conditions in its documented default configuration:

Untrusted input — Caller speech over phone and WebRTC feeds directly into the LLM reasoning loop without default input filtering. [1]
Sensitive data — Call recordings, transcripts, and operator-configured environment variables holding API keys are accessible within the session context. [7][10]
External egress — The Code Tool supports outbound HTTP to any destination, and the voice channel delivers spoken output directly to the caller without interception. [10]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Vapi's compromise radius is structurally bounded by the absence of file system access, shell execution, and infrastructure primitives, concentrating blast in outbound HTTP and credential environment variables.

Blast Radius Metrics

Code execution1

Credential access2

File system access0

Autonomous action1

Network access2

Deployment access0

Network access and credential exposure are the two highest-scored factors at the middle band, with file system and deployment access absent entirely and no factor reaching the upper band.

Each row ties the scored factor to the specific capability the agent exercises on the operator's behalf and the evidence anchoring the band.

Factor	Score	Comments
Code execution	1 / 4	Code Tool provides sandboxed TypeScript with no shell, no file system, and a configurable timeout capping execution duration. [10]
File system access	0 / 4	No file system access is documented for the Code Tool or any other agent-side component; artifacts are managed via the platform API. [10]
Network access	2 / 4	Outbound HTTP and HTTPS from the Code Tool are unrestricted by domain; no documented SSRF protection or egress allowlist is in place by default. [10]
Credential access	2 / 4	Operators place API keys and service credentials in Code Tool environment variables; when the LLM triggers code execution, the running script can read those credentials and exfiltrate them via outbound HTTP. [10]
Autonomous action	1 / 4	Actions are call-scoped; scheduled outbound calls require explicit API invocation and do not fire autonomously without operator action. [13]
Deployment access	0 / 4	No deployment, infrastructure modification, or package publishing capability is documented for any Vapi agent component. [6]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Vapi ships structured call logging and a sandboxed code runtime as defaults; input guardrails, action gates, and output filtering exist as opt-in features that the operator must explicitly enable.

Defense Controls Metrics

Input Guardrails0

Execution Isolation1

Action Controls0

Output Guardrails0

Monitoring1

Two components contribute to the defense total where higher scores indicate stronger protection, with input, action, and output controls at zero on the documented default posture.

Each component is scored against the vendor's documented default configuration; opt-in controls appear as hardening tips rather than baseline scores.

Component	Score	Comments
Input Guardrails	0 / 3	Security filter plans detecting SQL injection, XSS, SSRF, and prompt injection are documented but disabled by default; no input reaches a filter before the LLM processes it. [16]
Execution Isolation	1 / 3	Code Tool executes in an isolated sandbox with no file system and no shell; the compute boundary earns the isolation credit while the unrestricted outbound HTTP egress is scored separately under blast radius. [10]
Action Controls	0 / 3	Tool rejection plans allow conditional blocking of tool calls via pattern-matching and template-based conditions, but must be explicitly attached to each tool; no default deny or approval gate is in place. [12]
Output Guardrails	0 / 3	PCI and HIPAA compliance modes prevent data storage during sensitive phases but are opt-in; no default credential redaction, DLP, or exfiltration blocking is documented. [8][9]
Monitoring	1 / 3	Call recording, structured logging, transcription, and automated call analysis are enabled by default; SOC 2 Type II is certified; no SIEM forwarding or anomaly detection is documented. [6][14][15][19]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. The highest-leverage changes are enabling the built-in security filter plans for input guardrails and attaching tool rejection plans to every tool definition to gate autonomous execution.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails

Policy Require all production assistants to pass through security filter plan review before deployment — counters User Input at the upper band with zero default filtering.
Configuration Enable the security filter plan with prompt-injection and XSS detection set to reject mode on every assistant — counters the absent default input guardrails.
Engineering Deploy a pre-LLM classifier layer using the custom transcriber webhook to intercept and score transcripts before they reach the reasoning loop — counters the probabilistic prompt boundary.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation

Policy Mandate that all Code Tool deployments use the minimum timeout and restrict environment variables to non-sensitive values — counters credential exposure in the sandbox.
Configuration Configure custom bucket storage to keep call artifacts off Vapi infrastructure and reduce the vendor's data surface — counters the default data retention posture.
Engineering Request egress allowlisting from Vapi for production Code Tool deployments, or restrict Code Tool usage to scenarios where outbound HTTP is not required — counters unrestricted egress from the vendor-hosted sandbox.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls

Policy Establish a policy requiring every tool definition to carry a rejection plan with explicit trigger conditions — counters the absence of default approval gates.
Configuration Attach rejection plans to all custom function tools, handoff tools, and MCP server endpoints with conditions matching sensitive keywords — counters autonomous tool execution and untrusted MCP supply-chain risk.
Engineering Build a server-side webhook interceptor that validates tool-call parameters against an allowlist before returning results — counters unrestricted tool parameter injection.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails

Policy Require HIPAA or PCI compliance mode on all assistants handling regulated data to prevent default recording and transcript storage — counters the default artifact retention. [18]
Configuration Enable the recording consent plan on all production assistants to enforce caller consent before any audio capture — counters uncontrolled data collection.
Engineering Implement a server-side response filter on the webhook endpoint that strips sensitive tokens from tool results before they reach the LLM context — counters credential leakage through voice output.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring

Policy Forward all call analysis results and end-of-call reports to the enterprise SIEM for centralized alerting — counters the absence of automated security event detection.
Configuration Configure webhook subscriptions for all status-update and hang-notification events to surface anomalous call patterns — counters silent failures and undetected abuse.
Engineering Build a call-analysis post-processor that flags conversations matching known injection patterns and triggers automated alert workflows — counters the lack of behavioral anomaly detection. [5]

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

Vapi Prompting Guide — Jailbreak Acknowledgment Vendor documentation explicitly states that the prompt is probabilistic and can be jailbroken, recommending server-side enforcement for security-sensitive values rather than relying on prompt-level controls.

Selected Research

Voice Agent Jailbreaks 2026 Class-level analysis of production voice agent jailbreak patterns including role reframe attacks and indirect prompt injection via CRM notes and tool outputs.
OWASP Top 10 for LLM Applications 2025 Comprehensive framework covering prompt injection, excessive agency, system prompt leakage, and unbounded consumption risks across all LLM-based applications.
OWASP Top 10 for Agentic Applications 2026 Framework addressing agent goal manipulation, tool misuse, identity spoofing, and supply chain vulnerabilities in autonomous AI applications.
CSA Agentic Framework Rapid Exploitation Research note documenting sub-four-hour weaponization of agentic framework CVEs after public disclosure, highlighting the urgency of authentication and authorization defaults.

Vendor Documentation

Vapi Trust Center SafeBase-powered security portal presenting risk profile, product and data security controls, subprocessor list, and the executive summary of an independent penetration test.
Vapi Data Flow Documentation Documents ephemeral audio processing, artifact retention policy, HIPAA and PCI mode behavior, and the custom bucket storage option for call recordings and logs.
Vapi HIPAA Compliance Describes the opt-in HIPAA mode that disables call data storage and restricts pipeline components to HIPAA-compliant providers under a BAA.
Vapi PCI Compliance Documents PCI DSS Level 1 compliance mode for payment data handling, including artifact disabling during sensitive collection phases via squad-based architecture.
Vapi Code Tool Documents the sandboxed TypeScript execution environment with no file system access, limited memory, outbound HTTP only, and configurable timeout from ten to sixty seconds.
Vapi Custom Tools Documents the webhook-based custom tool integration where operator-hosted servers receive tool call payloads and return structured results to the agent.
Vapi Tool Rejection Plan Documents the opt-in conditional rejection mechanism for tool calls using regex, Liquid templates, and nested group logic to block execution based on conversation state.
Vapi Squads Documentation Documents multi-assistant orchestration within a single call using handoff tools with configurable context engineering and variable extraction.
Vapi Call Recording and Logging Documents the default-on artifact plan system for call recording, transcription, and structured logging with dashboard and API access.
Vapi Call Analysis Documents automated post-call summarization and success evaluation using Claude Sonnet with GPT-4o fallback, attached to the call record.

Other Sources

Vapi Security Filter Plans Changelog Changelog entry introducing security filter plans for SQL injection, XSS, SSRF, and prompt injection detection with sanitize, reject, and replace modes.
Vapi Knowledge Base Documentation Documents the RAG pipeline supporting file upload in multiple formats and custom knowledge base integration via webhook-style retrieval servers.
Vapi Privacy Policy Discloses data processing as a data processor on behalf of customers, retention policies, subprocessor relationships, and aggregated statistics collection.
Vapi GDPR Compliance Documents GDPR compliance including data subject rights, analytics subprocessors, penetration testing practices, and role-based access control testing.
Vapi Workflows Overview Documents the visual conversation flow builder with deterministic node-and-edge routing, global escalation nodes, and API request integration.