OpenAI Codex Agent Security Risks

Coding Agents openai.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (5) ATTACK SURFACE (6.8) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.49
High
Attack Surface
6.8
High
Blast Radius
5.38
High
Defense Controls
5
High
About The Agent

OpenAI Codex is a cloud and local AI coding agent that operates across a CLI, an IDE extension, and a web interface, running model-generated shell commands inside an OS-native sandbox with network disabled by default. The agent auto-loads project instructions and MCP server configurations from repository directories, shares a GitHub User Access Token with every cloud container, and delegates approval decisions to an auto-review subagent when configured, giving each integrated channel the same execution authority over the operator's workspace and connected repositories.

About the AI Risk Quadrant

Exposed Giants placement reflects an attack surface driven to the upper band by critical configuration-trust and tool-execution CVEs alongside a moderate blast radius constrained by the default network-off sandbox posture. OpenAI Codex ships meaningful isolation and approval defaults that keep the blast radius below the top band, but repeated sandbox bypasses and a broad-scope GitHub token undercut those controls, leaving operators with high exposure that documented defenses only partially contain.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. The dominant exposures concentrate in project-file auto-loading, sandbox boundary integrity, and the broad-scope GitHub token that persists inside every cloud container.

Key Input Risks
Untrusted content from project files, repository data, and MCP server configurations reaches the reasoning loop without a prompt shield or injection detection layer. CVE-2025-61260 (CVSS 9.8) demonstrated that a malicious repository's config file executes arbitrary commands on startup; operators should enforce pre-commit review of config.toml and AGENTS.md or disable project-file auto-loading on untrusted clones. [1][5]
Key Execution Risks
Shell commands execute within an OS-native sandbox that has been bypassed twice through model-directed path manipulation and symlink following. CVE-2025-59532 allowed arbitrary file writes outside the workspace boundary, and CVE-2025-55345 proved symlink-based sandbox escape to arbitrary file overwrite. [2][3]
Key Action Risks
The default approval policy pauses on sandbox-boundary crossings, but a single configuration key disables all gates entirely. GitHub tokens with repository read-write scope persist inside every cloud container, and branch-name command injection demonstrated automated multi-user token theft. [6][8]
Key Output Risks
Agent outputs flow to terminal, file system, and GitHub operations without credential redaction or data-loss prevention controls. No documented exfiltration blocking exists for the output channel, leaving operator secrets and source code exposed to any successful prompt injection. [8][9]
Key Monitoring Risks
Structured telemetry through OpenTelemetry and the Compliance API is available but entirely opt-in and disabled by default. Without enabling the otel block in config.toml, the operator has no audit trail, no active alerting, and no anomaly detection for agent-driven actions. [9][11]

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. OpenAI Codex lands in the Exposed Giants quadrant with an attack surface driven to the upper band by multiple critical CVEs and a blast radius held below the top tier by its default network-off sandbox.

AIRQ Metrics

The attack surface sits in the upper band while the blast radius stays below the top tier, placing OpenAI Codex among agents whose documented defenses partially offset substantial exploitation history.

Each row below summarizes one axis on its native scale: attack surface and blast radius out of ten, defense controls out of fifteen, and the AIRQ composite out of fifteen.

Metric Score Comments
AIRQ Score 4.49 Exploitation history against the default configuration offsets moderate defense investment, placing the composite in the lower third of the scale.
Blast Radius 5.38 / 10 Default network-off posture and workspace-scoped writes hold the blast radius below the top tier despite full shell access and broad-scope credential sharing.
Attack Surface 6.8 / 10 Multiple critical-severity CVEs against configuration trust and tool execution surfaces elevate the attack score alongside all three conditions for untrusted input, sensitive data access, and external egress being met.
Defense Controls 5 / 15 OS-native sandboxing and approval gates provide a meaningful baseline, but opt-in telemetry and the confirmed absence of output filtering leave the monitoring and exfiltration-prevention layers unfilled.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The dominant exposures are the auto-loaded project configuration layer, the MCP server trust boundary, and the tool-execution sandbox whose boundaries have been bypassed by multiple independent researchers.

Attack Surface Metrics

Four of ten surfaces carry the adjusted ceiling after critical-severity CVE penalties, and the remaining surfaces distribute across the lower bands.

Each row ties an attack surface to its base architectural band, any agent-specific penalty, and the evidence anchoring both.

Surface Score Comments
User Input 3 / 4 Multiple input channels across CLI, IDE, web, and auto-loaded project files feed the reasoning loop without a dedicated prompt shield or injection detection layer. [9]
External Data 5 / 4 Auto-loads AGENTS.md, .env, and .codex/config.toml from project directories without content validation; CVE-2025-55345 demonstrated prompt injection via the auto-loaded AGENTS.md instruction chain triggering symlink sandbox escape. [3][7][15]
Memory 2 / 4 Session history persists locally by default; cross-session state leakage of approved command prefixes across projects has been documented in the open repository [13], though no formal adversarial memory poisoning chain has been demonstrated.
Reasoning 2 / 4 Multi-step reasoning uses a vendor-provided model with visible chain-of-thought; a cyber safety classifier screens for dangerous patterns but is not a dedicated reasoning-chain integrity control. [9]
Planning 2 / 4 Multi-step task decomposition operates within the sandbox boundary by default and pauses for approval when exceeding it, constraining autonomous plan scope to the workspace. [8]
Tool Execution 5 / 4 Full shell access within the sandbox has been bypassed twice: CVE-2025-59532 via model-generated cwd manipulation and CVE-2025-55345 via symlink following outside workspace-write boundaries; an earlier auto-approval bug in ripgrep flags widened the execution surface before patching. [2][3][4]
Orchestration 2 / 4 Multi-step execution runs within a user-supervised session with no built-in daemon, cron, or background scheduling capability on the default configuration. [8]
Inter-Agent 5 / 4 MCP servers configured in project-local config.toml are auto-spawned without authentication; CVE-2025-61260 (CVSS 9.8) proved arbitrary command execution via malicious MCP entries. [1][5]
Output Processing 1 / 4 Output channels are limited to terminal text, file writes, and GitHub operations with no documented credential redaction, DLP, or exfiltration blocking. [8]
Configuration 5 / 4 Auto-executes .codex/config.toml and .env from untrusted project directories; CVE-2025-61260 (CVSS 9.8) demonstrated zero-click code execution through a cloned repository. [1][5]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. OpenAI Codex auto-loads repository files and MCP configurations into its reasoning loop, holds a broad-scope GitHub token inside every cloud container, and communicates with OpenAI APIs and configured MCP endpoints on the default posture. [1][6]

Lethal Trifecta · Complete (3 of 3)

OpenAI Codex exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Repository content, AGENTS.md project instructions, and MCP server configurations from untrusted project directories enter the reasoning loop without content validation. [1]
  • Sensitive data — The agent reads workspace source code, environment variables, and holds a GitHub User Access Token with repository read-write scope inside each cloud container. [6]
  • External egress — The agent communicates with OpenAI APIs by default and can reach configured MCP endpoints and network-allowed destinations when internet access is enabled. [8]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Compromise of the agent yields full shell execution within the sandbox, access to the operator's workspace files and GitHub token, but limited network reach due to the default-off network posture.

Blast Radius Metrics

Two of six factors sit at the upper band for code execution and credential access, while the remaining four stay in the lower bands due to default-off networking and scoped autonomy.

Each row ties a blast factor to the scope of damage an attacker gains after compromising the agent's reasoning loop on the documented default.

Factor Score Comments
Code execution 3 / 4 Full shell access with operator-level privileges inside the sandbox; CVE-2025-59532 demonstrated boundary bypass enabling command execution outside the intended workspace. [2]
File system access 2 / 4 Read-write access is scoped to the active workspace by default; read access extends beyond the workspace with approval, and symlink-based escape has been demonstrated. [3]
Network access 1 / 4 Network access is blocked by default via OS-native sandbox enforcement; configurable domain-allowlist controls restrict outbound traffic when network is enabled. [10]
Credential access 3 / 4 A broad-scope repository token persists inside every cloud container with read-write access to the operator's GitHub organizations; branch-name command injection demonstrated automated multi-user token exfiltration across four Codex product surfaces. [6][14]
Autonomous action 2 / 4 Autonomous execution within sandbox boundaries operates without per-action approval; sandbox-crossing actions require explicit user or auto-review approval by default. [8]
Deployment access 2 / 4 The agent can commit, push, and create pull requests through its GitHub token but has no direct deploy, publish, or infrastructure modification capability on the default. [8]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. OpenAI Codex ships OS-native sandboxing and configurable approval gates as default-on controls, but telemetry, output filtering, and input guardrails remain opt-in or absent.

Defense Controls Metrics

Higher scores indicate stronger vendor-implemented safeguards; the sandboxing and approval tiers anchor the defense floor while absent output guardrails and opt-in telemetry leave the upper bands unfilled.

Each component is scored on the vendor-documented default posture; opt-in mitigations reappear as hardening tips that an operator can layer on top.

Component Score Comments
Input Guardrails 1 / 3 A cyber safety classifier screens prompts for dangerous activity patterns, but the agent lacks a dedicated prompt injection detection layer that filters adversarial instructions embedded in project files before the reasoning loop processes them. [9]
Execution Isolation 2 / 3 OS-native Seatbelt and Landlock sandboxing with network-off default and workspace-scoped writes; multiple sandbox-bypass CVEs validated that the enforcement boundary exists but was breakable, and each bypass has since been patched. [2][3][10]
Action Controls 1 / 3 Approval policy pauses on sandbox-boundary crossings by default, but one configuration toggle removes every approval gate; auto-review mode delegates approval to an AI subagent whose acceptance criteria are not operator-configurable. [8][10]
Output Guardrails 0 / 3 Vendor documentation and the system card confirm no credential redaction, DLP, or exfiltration channel blocking exists for agent outputs on any deployment surface. [8]
Monitoring 1 / 3 OpenTelemetry export covers prompts, tool approvals, tool results, and network decisions but is off by default; enterprise audit logs available through the Compliance API and SOC 2 Type 2 certification covers the underlying platform. [11][12]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. The highest-leverage changes are enabling structured telemetry, restricting MCP server auto-loading, and deploying a prompt injection scanner ahead of the reasoning loop.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require security review of all AGENTS.md and .codex/config.toml files before they enter shared repositories — counters Configuration at adjusted ceiling.
  • Configuration Restrict project_doc_max_bytes in managed config.toml to limit the byte budget for auto-loaded AGENTS.md on untrusted repositories, or set it to zero on clones from unknown sources — counters External Data at adjusted ceiling while preserving instructions on trusted repositories.
  • Engineering Deploy a prompt injection detection classifier ahead of the Codex reasoning loop to filter adversarial instructions from project files — counters User Input at the upper band.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Mandate that all Codex sessions run in container or VM isolation rather than on developer workstations — counters code execution blast at the upper band.
  • Configuration Keep sandbox_mode at workspace-write and never configure danger-full-access in shared or managed deployments — counters tool execution at adjusted ceiling.
  • Engineering Wrap Codex CLI in a disposable container image with read-only root filesystem and no host credential mounts — counters credential blast at the upper band.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Prohibit approval_policy=never and auto_review in organizational configuration baselines — counters the single-toggle bypass that removes all sandbox-boundary approval gates and expands autonomous action scope.
  • Configuration Set approval_policy to on-request or untrusted in managed config.toml and lock it via admin policy override — counters action controls bypass risk.
  • Engineering Build a pre-execution hook that validates every shell command against an organizational allowlist before sandbox execution — counters tool execution at adjusted ceiling.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Establish a data classification policy that blocks Codex from operating in repositories containing secrets or PII — counters absent output guardrails.
  • Configuration Rotate GitHub tokens to minimum required scopes and enable token expiration for all Codex cloud environments — counters broad-scope token exposure in cloud containers.
  • Engineering Deploy a credential-scanning proxy between Codex output and GitHub operations to catch leaked secrets before they reach remote repositories — counters absent output guardrails.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Require OTel export to a centralized SIEM for every Codex deployment before granting production repository access — counters monitoring at the basic tier.
  • Configuration Enable the otel block in config.toml with log_user_prompt=false and route events to the organizational collector endpoint — counters monitoring at the basic tier.
  • Engineering Forward Codex Compliance API logs into the organizational SIEM and build detection rules for sandbox-bypass patterns and unusual tool approval sequences — counters monitoring at the basic tier.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2025-61260 MCP config command injection (CVSS 9.8) in @openai/codex through 0.23.0 enabling arbitrary code execution via project-local config files, patched in 0.23.0.
  2. CVE-2025-59532 Sandbox boundary bypass in @openai/codex 0.2.0-0.38.0 via model-generated cwd enabling arbitrary file writes outside workspace, patched in 0.39.0.
  3. CVE-2025-55345 Symlink following outside workspace-write sandbox (CVSS 8.8) enabling arbitrary file overwrite and RCE via AGENTS.md prompt injection.
  4. CVE-2025-54558 Auto-approval of ripgrep with dangerous flags (CVSS 4.1) in @openai/codex before 0.9.0 enabling unauthorized command execution.

Selected Research

  1. OpenAI Codex CLI Command Injection Check Point Research demonstrates CVE-2025-61260 end-to-end with reverse shell and credential harvesting through malicious MCP config files.
  2. Codex Branch Injection Token Theft BeyondTrust Phantom Labs discloses branch-name command injection in Codex cloud demonstrating automated GitHub token theft across multiple users.
  3. Codex CLI Symlink Sandbox Escape JFrog Security Research demonstrates CVE-2025-55345 combining AGENTS.md prompt injection with symlink sandbox escape for arbitrary file writes.

Vendor Documentation

  1. Agent Approvals and Security OpenAI docs covering sandbox modes, approval policies, network access, OTel monitoring, and the two-layer security model.
  2. GPT-5.2-Codex System Card Addendum System card documenting containerized isolation, network-disabled default, Seatbelt and Landlock sandboxing, and safety evaluation results.
  3. Codex Sandbox Documentation Vendor docs detailing workspace-write default, danger-full-access mode, network-off enforcement, and platform-native sandbox via Seatbelt and seccomp plus Landlock.
  4. Running Codex Safely at OpenAI Vendor blog describing managed config, auto-review mode, domain-restricted network policies, and OTel telemetry forwarding to centralized SIEM.
  5. OpenAI Trust Portal Compliance portal documenting SOC 2 Type 2 and ISO 27001/27017/27018/27701 certifications covering the API platform and ChatGPT services.

Other Sources

  1. Cross-Session State Leak GitHub issue documenting approved command prefixes leaking across projects and sessions via hidden developer context.
  2. Codex Branch Injection Analysis Independent analysis of the BeyondTrust disclosure covering disclosure timeline and architectural token-scope concern.
  3. AGENTS.md Custom Instructions Guide Vendor guide documenting instruction chain discovery, project scope layering, byte limits, and fallback filename configuration.