1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. ChatGPT Atlas presents a trifecta-complete risk profile where always-authenticated browsing, persistent memory, and unrestricted egress converge on an agent with minimal monitoring controls.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. The AIRQ composite captures how ChatGPT Atlas balances demonstrated input-layer vulnerabilities against its constrained execution and deployment scope. [13]
ChatGPT Atlas lands in the Exposed Giants quadrant with elevated attack surface 5.58 out of 10, blast radius 3.88 out of 10, and defense controls 6 out of 15.
Each axis uses its own denominator: attack surface and blast radius scale to 10, defense controls to 15, and the AIRQ composite to approximately 15.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 3.59 | Moderate composite reflects constrained blast radius offsetting elevated attack surface and weak defenses. |
| Blast Radius | 3.88 / 10 | Unrestricted network egress and credential access are the dominant factors; absent code execution caps the ceiling. |
| Attack Surface | 5.58 / 10 | Demonstrated memory injection and full three-axis threat convergence drive the score above the 4.8 floor. |
| Defense Controls | 6 / 15 | Execution isolation and action controls partially documented; monitoring is absent and remaining guardrails provide only pattern-level detection. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The Atlas agent reasoning loop ingests arbitrary web page content, user instructions, and persistent memory entries as first-class input without independent origin-based filtering.
Higher scores indicate surfaces where the agent accepts untrusted input with minimal validation or where demonstrated exploitation confirms the exposure.
Each row maps a named surface to its adjusted score and a one-line rationale citing the evidence anchor.
| Surface | Score | Comments |
|---|---|---|
| User Input | 4 / 4 | Always-authenticated default accepts web page content and user instructions into the CUA reasoning loop; CSRF memory injection demonstrated by LayerX. [1] |
| External Data | 4 / 4 | Agent ingests full page DOM content from visited sites into reasoning context; malicious page content demonstrated as injection vector. [2] |
| Memory | 4 / 4 | Persistent cross-session browser memories with automated writes; LayerX demonstrated CSRF-based injection of hidden instructions into ChatGPT memory. [12] [1] |
| Reasoning | 3 / 4 | Multi-step CUA reasoning processes arbitrary web content with adversarial training but no independent prompt shield per vendor disclosure. [10] [4] |
| Planning | 3 / 4 | Autonomous multi-step task decomposition in Agent Mode with user-configurable confirmation gates per vendor documentation. [8] |
| Tool Execution | 2 / 4 | Browser interaction only via screenshot-based CUA model; no shell, file download, or extension installation capability documented. [8] |
| Orchestration | 2 / 4 | Multi-step task execution within a single user-supervised session; vendor documentation describes no background or scheduled execution capability. [7] |
| Inter-Agent | 0 / 4 | Standalone agent with no inter-agent communication, no MCP integration, and no external AI service calls documented. [7] |
| Output Processing | 3 / 4 | Rich browser output including navigation to arbitrary URLs and form submissions; no documented DLP or URL sanitization. [9] |
| Configuration | 2 / 4 | User-configurable browser memories, diagnostic logs, and training preferences; custom instructions can modify agent behavior. [9] |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Atlas ingests arbitrary web page content into an always-authenticated session with unrestricted outbound navigation, satisfying all three conditions on the default configuration.
ChatGPT Atlas exhibits all three of these conditions in its documented default configuration:
- Untrusted input — The CUA reasoning loop processes full DOM content from any page the agent visits or renders without origin-based filtering. [1]
- Sensitive data — The agent operates in always-authenticated browser sessions with access to cookies, OAuth tokens, and session state across all logged-in services. [2]
- External egress — The agent navigates to arbitrary URLs as part of task execution with no egress filtering or URL allowlisting documented. [7]
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A compromised Atlas agent reaches the user's full authenticated web footprint via unrestricted navigation but cannot execute code, access files, or deploy infrastructure. [6]
Higher blast scores indicate factors where the agent can cause broader damage; absent capabilities score zero.
Each row maps a blast factor to its score and the specific capability boundary that determines the rating.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 0 / 4 | Agent cannot run code, download files, or install extensions; no shell access or sandboxed execution runtime documented. [8] |
| File system access | 0 / 4 | Agent cannot access other applications or the local file system; browsing is isolated from the host OS per vendor documentation. [8] |
| Network access | 4 / 4 | Unrestricted outbound web navigation to any URL without origin restrictions or allowlisting; no egress controls documented. [7] |
| Credential access | 3 / 4 | Agent operates in always-authenticated sessions with access to cookies and OAuth tokens; unencrypted token storage demonstrated. [3] [2] |
| Autonomous action | 2 / 4 | Multi-step autonomous actions with Watch Mode, Takeover Mode, and confirmation prompts; no single-step bypass documented. [8] |
| Deployment access | 0 / 4 | No access to cloud infrastructure, deployment pipelines, or production environments; scope limited to web browsing. [7] |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. OpenAI documents execution isolation and action approval gates but provides no monitoring, SIEM integration, or independent input filtering on the default configuration.
Higher defense scores indicate stronger vendor-implemented safeguards; the inverted scale means zero is worst.
Each component is scored based on vendor-documented controls present in the default configuration.
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 1 / 3 | Adversarially trained CUA model provides pattern-based injection detection; no standalone prompt-filtering layer exists per vendor documentation. [10] [4] |
| Execution Isolation | 2 / 3 | Agent contained to browser actions with no code execution or file system access; browser process isolated from host OS and other desktop applications. [8] |
| Action Controls | 2 / 3 | Configurable approval via confirmation prompts, Watch Mode for sensitive sites, and Takeover Mode per vendor documentation. [8] |
| Output Guardrails | 1 / 3 | Basic PII redaction from browser memory summaries documented; no DLP for agent navigation actions or exfiltration blocking. [9] |
| Monitoring | 0 / 3 | No Compliance API, SIEM hookpoints, or eDiscovery support documented; enterprise administrators lack visibility into agent activity. [11] |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize breaking the three-axis threat convergence by disabling persistent memory or enforcing logged-out mode before addressing monitoring gaps.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Restrict Atlas usage to non-sensitive tasks and non-confidential data until prompt injection defenses mature.
- Configuration Enable logged-out mode by default for all browsing sessions to eliminate the always-authenticated attack surface.
- Engineering Deploy a browser isolation proxy between Atlas and sensitive internal applications to contain prompt injection scope.
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Prohibit Atlas installation on machines with access to production systems or sensitive development environments.
- Configuration Configure network-level controls to restrict which domains the Atlas agent can reach during task execution.
- Engineering Implement a URL-filtering proxy that blocks agent navigation to known-malicious or out-of-scope domains.
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Require Watch Mode activation for all agent sessions involving authenticated services or financial transactions.
- Configuration Configure custom instructions to enforce mandatory confirmation before form submissions or account-modifying actions.
- Engineering Build an approval workflow that routes high-risk agent actions through secondary human review before execution.
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Establish a policy prohibiting Atlas for tasks involving PII, credentials, or regulated data until adequate data-loss-prevention controls are available and independently verified.
- Configuration Disable browser memories to prevent persistent storage of sensitive information extracted during agent sessions.
- Engineering Implement network monitoring to detect and alert on unusual outbound data patterns from the Atlas browser process.
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Require periodic manual review of Atlas browsing history and browser memories for unauthorized agent activity.
- Configuration Forward Atlas process network logs to SIEM as a compensating control until native Compliance API integration ships.
- Engineering Build custom telemetry collection from the Atlas SQLite database to detect credential exposure and memory injection.
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- ChatGPT Tainted Memories CSRF injection LayerX demonstrated CSRF-based persistent memory injection on ChatGPT Atlas (2025)
- Agentic Browsers and the Same-Origin Policy UW researchers demonstrated cross-origin data theft on ChatGPT Atlas Agent Mode (2026)
- ChatGPT Atlas OAuth token exposure Pete Johnson demonstrated unencrypted OAuth token storage in Atlas SQLite (2025)
Selected Research
- Hardening Atlas against prompt injection OpenAI discloses adversarial training and automated red-team loop for Atlas (2025)
- OpenAI CISO on Atlas prompt injection risks Simon Willison documents Dane Stuckey disclosure on unsolved prompt injection (2025)
- Computer-Using Agents in cyber attacks Push Security analyzes CUA credential exposure surfaces (2025)
Vendor Documentation
- ChatGPT Atlas product page Vendor page documenting agent capabilities and default configuration
- ChatGPT Agent on Atlas documentation OpenAI documents Agent Mode safeguards including Watch Mode and Takeover Mode
- ChatGPT Atlas Data Controls and Privacy OpenAI documents browser memory architecture and PII filtering pipeline
- Operator System Card OpenAI system card for CUA model documenting multi-layered safety approach
- ChatGPT Atlas for Enterprise OpenAI states Atlas not in SOC 2 or ISO scope and lacks Compliance API
Other Sources
- Atlas memory injection coverage The Register coverage of LayerX CSRF finding with OpenAI response (2025)
- Introducing ChatGPT Atlas OpenAI launch announcement for Atlas browser with agent mode (2025)