Devin Agent Security Risks

Coding Agents devin.ai Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (3) ATTACK SURFACE (7.62) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
5.57
High
Attack Surface
7.62
Critical
Blast Radius
8.5
Critical
Defense Controls
3
Critical
About The Agent

Devin is a cloud-hosted autonomous coding agent that operates inside a per-session VM with full shell, browser, and IDE access. The same operator-scoped runtime accepts prompts from a web interface, Slack, GitHub triggers, and any MCP-compatible client, then executes code, browses the web, and manages secrets without per-action approval on the default configuration. A persistent Knowledge system survives across sessions, and an MCP marketplace connects the agent to dozens of external services sharing the same execution authority.

About the AI Risk Quadrant

Exposed Giants placement reflects a high attack surface driven by demonstrated indirect prompt injection leading to full DevBox compromise, combined with a high blast radius from unrestricted shell execution, credential access, and network egress. Defense controls contribute almost nothing on the documented default: no input guardrails, no action approval gates, and output redaction that covers only the chat display while leaving shell and browser channels open. Operators inherit broad capability exposure with minimal vendor-provided containment.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Devin concentrates risk at the intersection of unrestricted tool execution, absent input filtering, and multiple uncontrolled egress channels on the documented cloud default.

Key Input Risks
Untrusted content from GitHub issues, websites, and MCP marketplace tools reaches Devin without any prompt shield or injection detection. Independent red-team research confirmed Devin follows adversarial instructions embedded in external web content without resistance. [2]
Key Execution Risks
Full shell, browser, and code execution run in a DevBox VM with no per-action approval gate on the cloud default configuration. Demonstrated compromise turned the DevBox into a remote-controlled agent running attacker malware with operator-level privilege. [2]
Key Action Risks
Cloud sessions execute tool calls autonomously without operator approval; the Terminal product offers a single-step bypass removing all gates. Secrets loaded as environment variables become accessible to any injected instruction without a secondary authorization boundary. [3]
Key Output Risks
Multiple exfiltration channels operate without output filtering: shell commands, browser navigation, image rendering, link unfurling, and ASCII smuggling all bypass the frontend secret redaction. Demonstrated research showed secrets leaving the DevBox through each channel independently. [3]
Key Monitoring Risks
Enterprise audit logs record session-level events via API but provide no behavioral anomaly detection or automated response to suspicious tool invocations. Operators should forward audit events to a SIEM with alerting rules to gain visibility the vendor does not provide by default. [8]

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Devin lands in the critical-risk band with attack surface and blast radius both in the upper range while vendor-provided defense controls remain near the floor.

AIRQ Metrics

The combination places Devin in a posture where operators must layer significant external containment before production deployment — network egress restriction, input validation, and action approval gates are prerequisites rather than enhancements.

Demonstrated exploitation of the DevBox via prompt injection, a patched session-access CVE, and confirmed multi-channel exfiltration anchor the upper bands while vendor-provided defense remains near the floor.

Metric Score Comments
AIRQ Score 5.57 Critical-band composite driven by demonstrated exploitation paths against broad default capability with minimal vendor containment.
Blast Radius 8.5 / 10 Full shell execution, unrestricted network including public port exposure, and access to all operator secrets drive the blast radius near maximum.
Attack Surface 7.62 / 10 Demonstrated indirect prompt injection, session-URL CVE, and multi-channel exfiltration anchor the upper attack-surface band with trifecta-complete.
Defense Controls 3 / 15 No input guardrails, no action approval on the cloud default, and output redaction limited to frontend display leave the defense floor near zero.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The dominant exposures are unrestricted external data ingestion, full tool execution without approval, and session-level configuration whose security relied on URL secrecy.

Attack Surface Metrics

External Data, Tool Execution, and Configuration sit at the adjusted ceiling with demonstrated exploitation evidence; Output Processing, Memory, and User Input carry elevated architectural exposure across remaining surfaces.

Each row scores an entry point from zero to five, combining the architectural exposure band with any demonstrated exploitation evidence.

Surface Score Comments
User Input 3 / 4 Multiple unvalidated channels including web, Slack, GitHub, API, and MCP with no instruction hierarchy or injection detection. [15]
External Data 5 / 4 Auto-ingests content from websites, GitHub issues, and repository config files without validation; demonstrated indirect prompt injection from web content. [2]
Memory 3 / 4 Persistent Knowledge system with trigger-based auto-retrieval, automated writes from repo structure, and no integrity verification or poisoning detection. [10]
Reasoning 3 / 4 Multi-step reasoning delegated to the Brain service with partial transparency; no independent chain-of-thought verification mechanism documented. [9]
Planning 3 / 4 Autonomous task decomposition with parallel session delegation, scheduling, and playbook-driven automation without per-plan operator approval. [17]
Tool Execution 5 / 4 Full shell, browser, and code execution in DevBox VM with no approval gate; demonstrated compromise to remote C2 via prompt-injected malware download. [2]
Orchestration 3 / 4 Spawns parallel managed sessions, supports cron scheduling, and runs headless via MCP without supervision or independent kill switch beyond budget cap. [17]
Inter-Agent 3 / 4 MCP server and marketplace enable any compatible client to create sessions and manage knowledge without inter-agent message authentication. [13]
Output Processing 4 / 4 Secret redaction covers frontend display only; demonstrated exfiltration via shell commands, web requests, rendered images, URL previews, and Unicode steganography bypasses every non-UI channel. [3]
Configuration 5 / 4 CVE-2024-56083 demonstrated session URL discovery granting unauthorized write access; auto-loads .cursorrules and .rules files from repos without validation. [1]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Devin processes untrusted web content and repository files, holds operator secrets as environment variables, and transmits data through unrestricted shell and browser channels without crossing any filtering boundary.

Lethal Trifecta · Complete (3 of 3)

Devin exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — GitHub issues, websites, and repository configuration files carry adversary-authored instructions directly into the reasoning loop. [2]
  • Sensitive data — All operator secrets loaded via the Secrets Manager are accessible as environment variables within the DevBox session. [3]
  • External egress — Shell commands, browser navigation, image rendering, and the expose_port tool all send bytes to attacker-controlled endpoints. [4]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Compromise of the DevBox reaches full shell execution, all operator secrets, and unrestricted outbound network including the ability to publish services to the public internet.

Blast Radius Metrics

Three of six factors sit at maximum band and the remaining three score in the upper range, meaning a compromised session gains capabilities equivalent to an operator with unrestricted access to the host environment.

Each factor scores the damage reach if an attacker gains control of the agent session through any attack surface above.

Factor Score Comments
Code execution 4 / 4 Full shell with operator-level privilege in DevBox VM; demonstrated Sliver C2 execution giving remote command-and-control access. [2][14]
File system access 3 / 4 Read-write access to all files within the DevBox VM including installed packages, source code, and configuration files. [12]
Network access 4 / 4 Unrestricted outbound internet by default; expose_port tool publishes any local service to a public .devinapps.com URL without approval. [4]
Credential access 4 / 4 All secrets from the Secrets Manager loaded as environment variables; demonstrated exfiltration of API keys and AWS credentials. [3]
Autonomous action 3 / 4 Sessions run autonomously until completion or ACU budget cap; scheduled recurring sessions execute without per-invocation approval. [17]
Deployment access 2 / 4 Creates pull requests and pushes code via GitHub integration; deployment requires merge approval outside the agent boundary. [15]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. The default posture ships per-session VM isolation and frontend secret redaction but no input filtering, no action approval gates, and no behavioral anomaly detection.

Defense Controls Metrics

The near-floor total reflects that operators must independently deploy input validation, action approval gates, and network egress filtering to achieve containment the vendor does not provide on the documented default.

The vendor provides compute isolation via per-session VMs and redacts secrets in the chat UI, but the default configuration lacks input filtering, action approval on the cloud product, and behavioral anomaly detection without operator-configured SIEM integration.

Component Score Comments
Input Guardrails 0 / 3 No prompt shield, injection detection, or content filter documented; independent security assessments confirmed the agent follows adversarial instructions without resistance. [2][5]
Execution Isolation 1 / 3 Per-session VM isolation with session destruction on completion, but unrestricted internet access within the VM negates network containment. [12]
Action Controls 0 / 3 Cloud default has no per-action approval gate; Terminal offers Normal mode with prompts but a single /bypass command removes all gates. [11]
Output Guardrails 1 / 3 Frontend displays secrets as redacted placeholders but shell, browser, and image channels pass data unfiltered to external destinations. [3]
Monitoring 1 / 3 Enterprise audit logs API provides session-level event history but no real-time anomaly detection or automated incident response. [7][8]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize network egress restriction, input validation deployment, and enforced sandbox mode to break the trifecta conditions on the default posture. [6]

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require human review of all external content before Devin processes GitHub issues or documentation from untrusted sources — counters External Data at adjusted ceiling.
  • Configuration Configure Knowledge triggers to exclude auto-retrieval from repositories with external contributors — counters Memory persistence without integrity verification.
  • Engineering Deploy a prompt-injection detection proxy between untrusted data sources and the Devin API to filter adversarial instructions — counters complete absence of input guardrails.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Mandate VPC deployment with network egress restriction for all production Devin usage — adds network-level containment beyond the existing per-session VM compute isolation on the default cloud posture.
  • Configuration Enable sandbox enforcement as Required in Team Settings for all Terminal users — counters Optional sandbox default that leaves OS-level isolation disabled. [16]
  • Engineering Implement network egress filtering with domain allowlists at the VPC level to restrict DevBox outbound connections — counters unrestricted network blast radius.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Establish an organizational policy requiring Normal permission mode and prohibiting /bypass usage for all Terminal sessions — counters single-step bypass of all approval gates.
  • Configuration Configure Team Settings permission deny rules blocking sensitive operations like expose_port and credential access by default — counters autonomous consequential tool invocation.
  • Engineering Implement pre-execution hooks that validate sensitive tool calls against an external policy engine before approval — counters absence of action controls on cloud sessions.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Prohibit storing production credentials in the Devin Secrets Manager and use short-lived tokens with minimal scopes instead — counters credential exfiltration via environment variables.
  • Configuration Restrict image rendering and link unfurling to allowlisted domains in session configuration — counters demonstrated exfiltration via image URLs and link previews.
  • Engineering Build a DLP proxy monitoring all DevBox outbound traffic for patterns matching secret formats before reaching external endpoints — counters multi-channel secret exfiltration.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Require security team review of audit logs weekly with defined escalation procedures for unusual session patterns — counters absence of automated anomaly detection.
  • Configuration Forward enterprise audit log events to organizational SIEM via the v3 API with alerting on credential-access and port-exposure events — counters lack of real-time monitoring.
  • Engineering Instrument DevBox network traffic with behavioral anomaly detection to flag unexpected outbound connections during sessions — counters silent exfiltration without operator visibility.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2024-56083 Cognition Devin VSCode Live Share URL exposure allowing unauthorized write access to session code (CVSS 8.1 HIGH). Patched on 2024-12-12 by requiring authentication beyond URL knowledge.

Selected Research

  1. Full DevBox Compromise via Prompt Injection Johann Rehberger demonstrated that indirect prompt injection from a website leads to full DevBox compromise including Sliver C2 malware execution and lateral movement capability.
  2. Multi-Channel Secret Exfiltration Demonstrated five independent exfiltration channels (shell curl, browser navigation, image rendering, link unfurling, ASCII smuggling) all capable of leaking secrets from the DevBox environment variables.
  3. AI Kill Chain Port Exposure Demonstrated that prompt injection can invoke the expose_port tool to publish a file server on a public .devinapps.com URL without any human verification or approval gate.
  4. AI Coding Agent Security Assessment Pillar Security assessment identifying Devin as lacking adequate input validation, output filtering, and programmatic policy enforcement compared to agents with hook-based policy systems.
  5. Devin Hardening Guide Community-maintained hardening guide aggregating all demonstrated Devin vulnerabilities with comparative analysis against other coding agents and specific mitigation recommendations.

Vendor Documentation

  1. Cognition Trust Center The vendor trust center confirms SOC 2 Type II certification since September 2024 and offers NDA-gated access to pentest reports and network diagrams.
  2. Enterprise Security Documentation Vendor-published security documentation covering encryption practices, compliance certifications, vulnerability disclosure program, and secrets handling guidance.
  3. Enterprise Deployment Architecture Documents the Brain plus DevBox architecture: the stateless Brain drives reasoning while the ephemeral DevBox VM hosts all tool execution, establishing the isolation boundary that defense controls must enforce.
  4. Devin Knowledge System Documents the persistent organizational Knowledge feature with trigger-based auto-retrieval, automated generation from repository structure, and cross-session sharing.
  5. Devin for Terminal Permissions Documents the four permission modes (Normal, Accept Edits, Bypass, Autonomous) including the single-step /bypass command that removes all approval gates.
  6. VPC Deployment Overview Documents per-session VM isolation, AES-256 encryption at rest, TLS 1.3 in transit, isolated Brain containers, and customer-managed DevBox setup.
  7. Devin MCP Server Documents the MCP server providing programmatic session management, knowledge CRUD, playbook management, and scheduling for any MCP-compatible client.

Other Sources

  1. Your Agent Has Root Independent timeline reconstruction placing the Embrace The Red findings into a broader kill-chain taxonomy, confirming that the port-exposure and C2 stages form a complete compromise sequence beyond isolated PoCs.
  2. Devin Integrations Overview Documents native integrations with GitHub, Slack, and Jira plus the MCP marketplace connecting to Sentry, Datadog, PostgreSQL, and dozens more external services.
  3. Devin for Terminal Team Settings Documents enterprise sandbox enforcement modes including the Optional default that leaves OS-level isolation disabled unless explicitly configured by administrators.
  4. Devin Advanced Capabilities Documents parallel sessions, scheduling, playbooks, autonomous task decomposition, and headless operation via MCP.