DeepSeek Agent Security Risks

General Assistant Agents deepseek.com Humble Providers
AI RISK QUADRANT POSITION DEFENSE CONTROLS (4) ATTACK SURFACE (3.72) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
0.82
Critical
Attack Surface
3.72
Medium
Blast Radius
1
Low
Defense Controls
4
High
About The Agent

DeepSeek is a cloud-hosted general-purpose AI chatbot and API operated by Hangzhou DeepSeek Artificial Intelligence. The default deployment exposes a multi-turn chat interface at chat.deepseek.com with built-in web search, file upload processing, and an OpenAI-compatible API supporting function calling for up to 128 developer-defined tools. The stateless API follows chat-completions conventions. The agent holds no persistent cross-session memory, executes no code on behalf of users, and delegates no tasks to subagents in its default configuration.

About the AI Risk Quadrant

Humble Providers agents combine a modest blast radius with meaningful attack-surface exposure. DeepSeek lands here because its cloud-hosted architecture eliminates code execution, file-system access, and credential handling from the default threat model, while demonstrated prompt-injection bypass rates and a confirmed XSS vulnerability elevate the input and output attack surfaces well above baseline. Operators integrating DeepSeek via API inherit responsibility for tool-execution sandboxing and output sanitization.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. DeepSeek presents high input-layer and output-layer risk from demonstrated jailbreak bypass and XSS, offset by minimal blast radius from the absence of autonomous execution capabilities.

Key Input Risks
The chat interface accepts arbitrary user prompts, web-search results, and uploaded documents without effective input filtering on the documented default configuration. Independent testing demonstrated prompt injection, system prompt leakage, and language-dependent safety bypass with full reproducibility [4][8].
Key Execution Risks
The default cloud service does not execute user code or provide a shell environment, delegating all tool execution to developer-implemented backends via the function-calling API. No sandbox documentation or isolation controls are published by the vendor [7].
Key Action Risks
DeepSeek performs no autonomous actions on its documented default configuration because no OAuth scopes, send capabilities, or deployment integrations exist. The highest-blast-radius scope available is network egress through built-in web search [7].
Key Output Risks
Output renders as rich text and markdown in the web UI without documented DLP, credential redaction, or URL sanitization on the output channel. The chat rendering domain carries demonstrated XSS risk per CVE-2025-26210 with CVSS 8.8 [1].
Key Monitoring Risks
No public documentation exists for audit logging, SIEM integration, or anomaly detection on the consumer service. The vendor terms disclaim output accuracy and the privacy policy names only commercially reasonable measures without specifying operator-facing telemetry [6][10].

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. DeepSeek scores 0.73 on the AIRQ composite index, reflecting elevated attack surface offset by minimal blast radius.

AIRQ Metrics

DeepSeek occupies the Humble Providers quadrant because its high attack-surface score pairs with a low blast-radius footprint.

The table below summarizes the four AIRQ axis scores and the composite index for DeepSeek.

Metric Score Comments
AIRQ Score 0.82 Composite risk reflects high attack surface offset by minimal blast radius and weak defenses.
Blast Radius 1 / 10 Low because the default service provides no code execution, file-system, or credential access.
Attack Surface 3.72 / 10 Elevated by agent-specific prompt-injection and XSS evidence penalties on three dimensions.
Defense Controls 4 / 15 Very weak defenses with 58 to 100 percent demonstrated bypass rates on input guardrails [2][3].

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. DeepSeek exposes ten attack-surface dimensions with three elevated by agent-specific evidence penalties totaling +4.0 points.

Attack Surface Metrics

Dimensions scoring 3 or above carry agent-specific evidence anchoring the penalty to a demonstrated exploit.

Each row shows the base score, evidence penalty, and adjusted total for one attack-surface dimension.

Surface Score Comments
User Input 4 / 4 Chat accepts arbitrary prompts and web-search content with language-dependent safety bypass demonstrated at 100 percent reproducibility [8].
External Data 2 / 4 Web search fetches untrusted content into the reasoning context and file uploads accept arbitrary document formats [7].
Memory 1 / 4 Server-side conversation history retained per account but API is stateless with no cross-session agentic memory [7].
Reasoning 3 / 4 Extended chain-of-thought reasoning with alignment bypassed through multilingual system prompts on the official platform [8].
Planning 1 / 4 Single-turn or multi-turn conversation only with no task decomposition or autonomous plan execution [7].
Tool Execution 1 / 4 API function-calling requires developer implementation of tool execution with no built-in shell or code sandbox [7].
Orchestration 1 / 4 No multi-agent delegation, background tasks, event hooks, or scheduling primitives in the architecture [7].
Inter-Agent 0 / 4 No inter-agent communication protocol exists as the service operates as a standalone endpoint [7].
Output Processing 5 / 4 CVE-2025-26210 demonstrates XSS via JavaScript execution in the chat rendering domain with CVSS 8.8 [1].
Configuration 1 / 4 Cloud-hosted with vendor-managed infrastructure where a backend ClickHouse database was exposed without authentication [9].

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. DeepSeek triggers two of the three conditions on its documented default configuration, avoiding the X-axis floor.

Lethal Trifecta · Partial (2 of 3)

DeepSeek exhibits two of the three trifecta conditions in its documented default configuration:

  • Untrusted input — Chat accepts prompts from any user and web search fetches untrusted content with demonstrated bypass rates [2].
  • Sensitive data — The default configuration processes only user-provided text and files without accessing operator credential stores [6].
  • External egress — Built-in web search issues outbound HTTP requests and the output rendering surface carries demonstrated XSS exploitation risk [1].

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. DeepSeek presents minimal blast radius because the default configuration lacks autonomous execution, persistent storage, or credential-handling capabilities.

Blast Radius Metrics

All factors score 0 or 1 because DeepSeek lacks autonomous execution capabilities on its default configuration.

Each row scores one blast-radius factor on the 0 to 4 scale for DeepSeek's default deployment.

Factor Score Comments
Code execution 0 / 4 No user-code execution or shell environment exists in the default cloud deployment as function-calling delegates to developer backends [7].
File system access 0 / 4 Cloud-hosted with no local file-system interaction as file uploads are processed in-context only [7].
Network access 1 / 4 Web search sends HTTP requests to external URLs during a session but no outbound SSRF is documented [7].
Credential access 0 / 4 The service does not access operator credentials or API keys in its default configuration despite the Wiz-reported backend exposure [5][6].
Autonomous action 0 / 4 No autonomous actions exist as every interaction requires explicit user initiation without scheduling [7].
Deployment access 0 / 4 No deployment capability exists with no CI/CD integration or infrastructure provisioning in the default service [7].

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. DeepSeek provides minimal documented defense controls with demonstrated bypass rates undermining the few that exist.

Defense Controls Metrics

Scores reflect documented controls only, with confidence tiers indicating whether evidence is confirmed or inferred.

Each row scores one defense component on the 0 to 3 scale with the confidence tier for DeepSeek.

Component Score Comments
Input Guardrails 1 / 3 Content safety filter exists but independent testing demonstrated 58 to 100 percent jailbreak success rates [2][3].
Execution Isolation 1 / 3 Cloud-hosted architecture provides implicit isolation but no explicit sandbox controls are documented [7].
Action Controls 1 / 3 No autonomous actions to control as API function-calling requires developer implementation for execution [7].
Output Guardrails 1 / 3 Content filter returns content_filter stop reason but CVE-2025-26210 demonstrates XSS in the rendering surface [1].
Monitoring 0 / 3 No public documentation of audit logging or SIEM integration exists for the consumer service [6].

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should layer external controls across all five defense components to compensate for weak vendor defaults.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Establish an acceptable-use policy prohibiting sensitive enterprise data from reaching the DeepSeek API and require all integrations to route through a corporate proxy with content-inspection capability.
  • Configuration Deploy a third-party prompt-injection detection layer such as Lakera Guard or Azure AI Content Safety as a pre-filter before forwarding user input to the DeepSeek API endpoint.
  • Engineering Implement input-length caps, language-detection filtering, and structured-prompt templates that constrain freeform input to reduce the multilingual jailbreak attack surface.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Require all API function-calling integrations to run tool execution inside a sandboxed environment with no access to production credentials or persistent file systems.
  • Configuration Use container-based isolation with gVisor or Firecracker for any tool-execution backend processing DeepSeek function-call outputs, limiting network egress to an allowlist.
  • Engineering Validate and sanitize all function-call arguments generated by the model before passing them to tool implementations, treating model output as untrusted input at the boundary.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Define a human-approval gate for any integration that translates DeepSeek API responses into write operations including database mutations, email sends, or financial transactions.
  • Configuration Implement rate limiting and scope restriction on any OAuth tokens or API keys accessible to systems processing DeepSeek outputs following least-privilege principles.
  • Engineering Build an explicit action-allowlist mapping permitted function-call names to validated parameter schemas and rejecting any model-generated call outside the allowlist.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Prohibit rendering DeepSeek output as executable HTML in any user-facing application and require all integration points to sanitize model output before display.
  • Configuration Deploy a DLP scanner on the output path that detects and redacts credentials, PII, and internal URLs before responses reach end users or downstream systems.
  • Engineering Apply strict Content-Security-Policy headers and HTML entity encoding on all surfaces rendering DeepSeek responses, mitigating the XSS vector demonstrated in CVE-2025-26210.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Require logging of all API interactions to an immutable audit store with a minimum 90-day retention for security investigation and incident response.
  • Configuration Forward API interaction logs to a SIEM platform with alerting rules for anomalous patterns such as repeated jailbreak attempts, unusual token volumes, or function-call spikes.
  • Engineering Instrument the integration layer with structured telemetry capturing prompt classification scores, content-filter trigger rates, and function-call frequency for anomaly detection.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. CVE-2025-26210 XSS in DeepSeek R1 through V3.1 allowing JavaScript execution in the chat rendering domain. CVSS 8.8 HIGH. Patch status unknown.

Selected Research

  1. DeepSeek Jailbreak Vulnerability Analysis Qualys TotalAI tested DeepSeek R1 against 885 jailbreak attacks across 18 techniques and found a 58 percent failure rate.
  2. Evaluating Security Risk in DeepSeek Cisco security evaluation achieved 100 percent attack success rate against DeepSeek R1 on 50 HarmBench prompts.
  3. Exposing the Security Risks of DeepSeek-R1 HiddenLayer demonstrated prompt injection, system prompt leakage, XSS generation, PII leakage, and DoS against DeepSeek R1.
  4. Wiz Research Uncovers Exposed DeepSeek Database Wiz discovered an unauthenticated ClickHouse database at DeepSeek exposing over one million chat logs and API secrets.

Vendor Documentation

  1. DeepSeek Privacy Policy The vendor privacy policy documents all personal data stored on servers in China with commercially reasonable security measures.
  2. DeepSeek API Documentation Official API docs describe the OpenAI-compatible endpoint with function calling support and content_filter stop reason.

Other Sources

  1. Language-dependent safety filter bypass GitHub issue demonstrating Russian-language system prompt completely removes safety restrictions with 100 percent reproducibility.
  2. DeepSeek Database Exposure Coverage InfoQ coverage of the Wiz ClickHouse incident where unauthenticated databases exposed chat history and API secrets.
  3. DeepSeek Terms of Use Vendor terms document that user inputs may be used to improve the model unless users opt out.