ChatGPT Agent Security Risks

General Assistant Agents chatgpt.com Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (6) ATTACK SURFACE (6.38) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
5.06
High
Attack Surface
6.38
High
Blast Radius
5.5
High
Defense Controls
6
High
About The Agent

ChatGPT is a cloud-hosted general-purpose AI assistant built by OpenAI that combines web browsing, code execution, persistent memory, and OAuth-gated connectors to external services in a single agentic system. The default configuration enables browsing, code interpreter, and cross-session memory on the consumer tier. The primary risk surface is the combination of multiple untrusted input channels with demonstrated data exfiltration paths through markdown rendering and DNS tunneling from the code execution sandbox.

About the AI Risk Quadrant

Exposed Giants are agents whose attack surface exceeds the midline while blast radius stays moderate, creating an exposure profile where data leakage outpaces the scope of a full compromise. ChatGPT lands here because its attack surface score (6.38) reflects six surfaces with demonstrated exploitation penalties while its blast radius (5.50) is contained by cloud-hosted sandboxed execution. Operators should prioritize output-channel hardening and trifecta-breaking controls to reduce the leakage pathways that define this quadrant placement.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. ChatGPT presents a broad input surface with demonstrated exploitation across memory, output rendering, and code execution channels on a consumer tier that provides no audit logging.

Key Input Risks
ChatGPT accepts attacker-controlled bytes through browsing, uploaded files, GPT Store instructions, and connector payloads with no external prompt shield on the consumer tier. The Agent System Card identifies indirect prompt injection from fetched web content as a primary risk vector.
Key Execution Risks
The code interpreter executes Python inside a network-restricted sandbox whose DNS channel was publicly exploitable until Check Point Research forced a February 2026 patch. Agent mode grants terminal access with limited outbound restrictions and no per-command operator approval gate.
Key Action Risks
Memory writes via the bio tool fire without user consent when triggered by prompt injection in untrusted documents. Connected service integrations hold scoped OAuth tokens for Gmail, GitHub, and Google Drive with no per-action reconfirmation after the initial grant.
Key Output Risks
ChatGPT emits rich markdown output including rendered images and hyperlinks with url_safe domain filtering that has been repeatedly bypassed by independent researchers. Markdown image rendering remains the primary channel through which untrusted output reaches external servers.
Key Monitoring Risks
The consumer tier logs conversation history only with no audit trail, no SIEM integration, and no anomaly detection available. The Enterprise Compliance Logs Platform with 30-day retention is an opt-in paid upgrade that most deployments do not enable.

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. ChatGPT carries a moderate AIRQ composite reflecting a broad attack surface with limited consumer-tier defenses and a cloud-contained blast radius.

AIRQ Metrics

ChatGPT places in the Exposed Giants quadrant with an attack surface of 6.38, blast radius of 5.50, and defense score of 6 out of 15.

Each axis measures a distinct risk dimension: attack surface out of 10, blast radius out of 10, defense controls out of 15, and the AIRQ composite out of 15.

Metric Score Comments
AIRQ Score 5.06 Moderate composite risk driven by a broad attack surface and limited consumer-tier defenses requiring operator-managed hardening.
Blast Radius 5.5 / 10 Contained by cloud-hosted sandboxed execution but expanded by OAuth connector credential scopes to Gmail, GitHub, and Google Drive.
Attack Surface 6.38 / 10 Six surfaces carry demonstrated-penalty scores with trifecta-complete status across untrusted input, sensitive data, and external egress.
Defense Controls 6 / 15 Vendor ships sandbox isolation and partial action controls but no consumer-tier audit logging or external prompt shield.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. ChatGPT accepts input from web browsing, file uploads, custom GPT instructions, OAuth connectors, and persistent memory with no instruction hierarchy separation on the consumer tier.

Attack Surface Metrics

Higher scores reflect surfaces where independent researchers have demonstrated exploitation on ChatGPT directly, with penalties applied for documented attack chains.

Each row maps an attack surface to its adjusted score and a comment describing the specific risk that surface presents for ChatGPT.

Surface Score Comments
User Input 4 / 4 Multiple unvalidated channels including web UI, API, and plugin-fetched content enter the reasoning loop with demonstrated exfiltration and account takeover. [1][3][7]
External Data 4 / 4 Web browsing fetches arbitrary untrusted pages, the GPT Store exposes community-authored instructions, and OAuth connectors pull private data into context. [14][17]
Memory 4 / 4 Persistent cross-session memory via the bio tool can be written without user consent through prompt injection, creating a durable poisoning vector. [9][11][24]
Reasoning 2 / 4 Multi-step reasoning with visible chain-of-thought is constrained to the declared task scope with no documented reasoning-loop manipulation beyond standard injection paths. [14][21]
Planning 2 / 4 Agent mode plans multi-step tasks with user-visible steps and Watch Mode pauses execution in browser-sensitive contexts for partial operator oversight. [14]
Tool Execution 3 / 4 Code interpreter runs Python in a sandboxed container that had an exploitable DNS egress channel until the February 2026 patch. [2][14]
Orchestration 3 / 4 Agent mode chains browsing, code execution, and connectors in multi-step sessions with demonstrated persistent command-and-control over instances. [8][16]
Inter-Agent 2 / 4 The GPT Store hosts community-built custom GPTs with Actions calling external APIs and no inter-agent authentication or message integrity verification. [12][20]
Output Processing 4 / 4 The url_safe output filter has been repeatedly bypassed to exfiltrate conversation data and stored memories through trusted-domain relay techniques. [6][10]
Configuration 3 / 4 The GPT Store marketplace auto-loads creator-defined instructions with minimal vetting, and parental controls confirm memory toggling is a user-managed setting. [5][12][18]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. ChatGPT ingests untrusted web content and connector data, accesses private user files and email via OAuth, and emits rendered output through browsing and markdown channels that have been exploited for exfiltration.

Lethal Trifecta · Complete (3 of 3)

ChatGPT exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Web browsing fetches arbitrary untrusted pages and custom GPT instructions from community creators flow directly into the reasoning loop. [7][14]
  • Sensitive data — OAuth connectors grant scoped access to Gmail messages, Google Drive files, and GitHub repositories containing private user data. [16][17]
  • External egress — Web browsing navigates to external URLs, GPT Actions send data to third-party APIs, and markdown image rendering triggers HTTP requests to external servers. [2][10]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A ChatGPT compromise reaches the sandboxed code execution environment, OAuth-scoped access to external services, and agent-mode browser sessions with no deployment access.

Blast Radius Metrics

Higher blast scores indicate factors where ChatGPT can access or modify resources beyond the immediate conversation sandbox.

Each row maps a blast factor to its score and the specific ChatGPT capability that determines the scope of potential damage.

Factor Score Comments
Code execution 2 / 4 Python interpreter runs in a sandboxed container with network restrictions; agent mode terminal provides scoped shell access with limited outbound connectivity. [2][14]
File system access 2 / 4 Read-write access is scoped to the session workspace with file generation and downloads supported but no host filesystem access from the cloud runtime. [14]
Network access 3 / 4 Web browsing navigates to arbitrary external URLs and the code interpreter DNS channel was uncontrolled until the February 2026 patch. [2][19]
Credential access 3 / 4 OAuth connectors grant scoped access to Gmail, GitHub, and Google Drive user data; agent mode browser interacts with authenticated web sessions. [4][16]
Autonomous action 2 / 4 Agent mode executes multi-step tasks with Watch Mode pausing in browser-sensitive contexts; connector writes proceed without per-action approval after initial OAuth grant. [14]
Deployment access 1 / 4 ChatGPT cannot deploy code to infrastructure or modify production environments; no IaC, CI/CD, or cloud console access is available. [14]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. OpenAI documents sandbox isolation and partial action controls but the consumer tier lacks audit logging, external prompt shields, and comprehensive output filtering.

Defense Controls Metrics

Higher defense scores indicate stronger vendor-implemented safeguards that reduce the operator's residual hardening burden.

Each component is scored based on what the vendor implements by default versus what the operator must configure or build independently.

Component Score Comments
Input Guardrails 1 / 3 Model-level safety training and automated monitors provide basic filtering with no external prompt shield; the trust portal documents compliance but not an ML-based injection detection stack. [13][14][15]
Execution Isolation 2 / 3 Code interpreter runs in a sandboxed container with HTTP blocked and DNS now restricted; the agent mode terminal has limited network access. [2][14]
Action Controls 1 / 3 User confirmation dialogs for GPT Actions and Watch Mode exist alongside a public safety bug bounty but the memory bio tool is invocable without user consent. [9][14][22]
Output Guardrails 1 / 3 The url_safe feature restricts rendered domains and CSP-based image mitigations are shipped but multiple bypass demonstrations have been documented. [10][15]
Monitoring 1 / 3 Consumer tier provides conversation history only with no audit logging; the Enterprise Compliance Logs Platform with SIEM integration is a paid opt-in upgrade. [23]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize breaking the trifecta by restricting untrusted input channels and hardening output rendering controls before addressing monitoring gaps.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require all external content ingested via browsing or connectors to pass through an operator-managed content classifier before entering the reasoning loop.
  • Configuration Disable memory persistence and restrict browsing to an operator-curated domain allowlist for sensitive workloads using the data controls settings.
  • Engineering Deploy an external prompt injection detection layer that inspects fetched web content and connector payloads before they reach the context window.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Restrict code interpreter and terminal usage to designated user roles and require managerial approval for agent mode activation in enterprise workspaces.
  • Configuration Disable the code interpreter tool for user groups that do not require it using the admin console GPT controls and workspace permissions.
  • Engineering Route all code execution sandbox egress through an operator-managed DNS resolver with response policy zones to detect tunneling attempts.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Audit all installed GPT Actions quarterly and revoke OAuth grants for connectors that are no longer in active use across the workspace.
  • Configuration Disable the memory bio tool via data controls for workloads processing sensitive data to prevent prompt-injection-driven persistent memory writes.
  • Engineering Implement an API gateway between GPT Actions and third-party endpoints that enforces per-request data classification and blocks sensitive content categories.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Require all ChatGPT-generated content to undergo manual review before being forwarded to external recipients or published to downstream systems.
  • Configuration Enable Temporary Chat mode for sessions processing untrusted content to prevent conversation data from persisting in history or memory stores.
  • Engineering Deploy a data loss prevention proxy that inspects rendered markdown output for encoded exfiltration patterns and blocks requests to unallowlisted image domains.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Mandate the Enterprise or Business tier for all organizational deployments to gain access to the Compliance Logs Platform and SIEM integration.
  • Configuration Configure the Compliance API to forward all audit log events to the organization SIEM with alerts on anomalous tool invocation patterns.
  • Engineering Build a custom log analysis pipeline that correlates ChatGPT compliance log events with network telemetry to detect exfiltration attempts.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. ChatGPT Conversation Exfiltration MITRE ATLAS case study documenting indirect prompt injection via untrusted web content causing ChatGPT to exfiltrate conversation data through markdown image rendering.
  2. ChatGPT Data Leakage via DNS Tunneling Check Point Research disclosed a hidden outbound DNS channel in the ChatGPT code execution sandbox that enabled silent data exfiltration and bidirectional command-and-control.
  3. ChatGPT Account Takeover via Web Cache Deception Independent researcher demonstrated account takeover by exploiting URL parser confusion between Cloudflare CDN and the ChatGPT web server to cache authentication tokens.
  4. ChatGPT Account Compromise Data Exposure AI Incident Database records a ChatGPT account compromise that led to unintended exposure of sensitive conversations including login credentials and personal data.
  5. ChatGPT macOS Unencrypted Storage AI Incident Database records that the ChatGPT macOS app stored user conversations in unencrypted plain text files accessible to any local process.
  6. ChatGPT Share Links Public Exposure AI Incident Database records over 100000 ChatGPT shared conversations indexed by search engines and archived exposing API keys and sensitive business data.

Selected Research

  1. ChatGPT Plugins Data Exfiltration via Markdown Injection Embrace The Red demonstrated that a malicious website can exfiltrate ChatGPT conversation history through markdown image rendering during indirect prompt injection.
  2. ZombAI Command-and-Control via Prompt Injection Embrace The Red established a command-and-control system over ChatGPT instances by combining prompt injection with a url_safe bypass.
  3. ChatGPT Persistent Denial of Service via Memory Poisoning Embrace The Red demonstrated that prompt injection through untrusted documents can plant malicious memories in ChatGPT that cause persistent denial of service.
  4. ChatGPT Chat History Exfiltration via url_safe Bypass Embrace The Red demonstrated exfiltration of ChatGPT chat history and stored memories by bypassing the url_safe rendering feature.
  5. Sleeper Memory Poisoning in LLM Agents Academic study demonstrating that adversarial external content can coerce assistants including ChatGPT into storing fabricated memories that later influence behavior.
  6. Custom GPTs Vulnerabilities in the OpenAI Ecosystem Large-scale analysis of 14904 custom GPTs finding that 95 percent lack defenses and 92.9 percent are vulnerable to system prompt leakage.

Vendor Documentation

  1. OpenAI Trust Portal Centralized compliance portal hosting SOC 2 Type 2 reports and ISO 27001 and 27701 certificates for ChatGPT business services.
  2. ChatGPT Agent System Card Vendor-published system card documenting the ChatGPT agent architecture and prompt injection mitigations including Watch Mode and terminal network restrictions.
  3. Security and Privacy at OpenAI Vendor security overview documenting SOC 2 and ISO certifications and HIPAA BAA availability and infrastructure security practices.
  4. Introducing ChatGPT Agent Product announcement documenting the unified agentic system combining visual browser and text browser and terminal and OAuth connectors.
  5. ChatGPT Apps and Integrations Vendor documentation of third-party app integrations including workspace permissions and audit logs and data privacy controls.
  6. ChatGPT Parental Controls Vendor documentation of memory toggle and model training opt-out and safety protections on the ChatGPT product domain.
  7. ChatGPT Plans and Security Features Vendor pricing page documenting security features by tier including SOC 2 compliance and SSO and SCIM and Compliance API Logs Platform.

Other Sources

  1. OWASP Top 10 for LLM Applications 2025 Industry framework ranking prompt injection as LLM01 and excessive agency as LLM06 reflecting real-world exploitation patterns across agentic AI products.
  2. Prompt Injection Explained Simon Willison analysis explaining why prompt injection remains an unsolved vulnerability for any application including ChatGPT.
  3. OpenAI Safety Bug Bounty OpenAI launched a public Safety Bug Bounty covering agentic prompt injection and data exfiltration with 409 confirmed security vulnerabilities.
  4. OpenAI Compliance Logs Platform Documentation for the Enterprise and Edu Compliance Logs Platform providing immutable audit log events with SIEM integration.
  5. How ChatGPT Memory Works Reverse engineering analysis of ChatGPT memory and chat history features revealing that Recent Conversation Content builds a persistent user profile.