1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. DeepSeek presents high input-layer and output-layer risk from demonstrated jailbreak bypass and XSS, offset by minimal blast radius from the absence of autonomous execution capabilities.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. DeepSeek scores 0.73 on the AIRQ composite index, reflecting elevated attack surface offset by minimal blast radius.
DeepSeek occupies the Humble Providers quadrant because its high attack-surface score pairs with a low blast-radius footprint.
The table below summarizes the four AIRQ axis scores and the composite index for DeepSeek.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 0.82 | Composite risk reflects high attack surface offset by minimal blast radius and weak defenses. |
| Blast Radius | 1 / 10 | Low because the default service provides no code execution, file-system, or credential access. |
| Attack Surface | 3.72 / 10 | Elevated by agent-specific prompt-injection and XSS evidence penalties on three dimensions. |
| Defense Controls | 4 / 15 | Very weak defenses with 58 to 100 percent demonstrated bypass rates on input guardrails [2][3]. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. DeepSeek exposes ten attack-surface dimensions with three elevated by agent-specific evidence penalties totaling +4.0 points.
Dimensions scoring 3 or above carry agent-specific evidence anchoring the penalty to a demonstrated exploit.
Each row shows the base score, evidence penalty, and adjusted total for one attack-surface dimension.
| Surface | Score | Comments |
|---|---|---|
| User Input | 4 / 4 | Chat accepts arbitrary prompts and web-search content with language-dependent safety bypass demonstrated at 100 percent reproducibility [8]. |
| External Data | 2 / 4 | Web search fetches untrusted content into the reasoning context and file uploads accept arbitrary document formats [7]. |
| Memory | 1 / 4 | Server-side conversation history retained per account but API is stateless with no cross-session agentic memory [7]. |
| Reasoning | 3 / 4 | Extended chain-of-thought reasoning with alignment bypassed through multilingual system prompts on the official platform [8]. |
| Planning | 1 / 4 | Single-turn or multi-turn conversation only with no task decomposition or autonomous plan execution [7]. |
| Tool Execution | 1 / 4 | API function-calling requires developer implementation of tool execution with no built-in shell or code sandbox [7]. |
| Orchestration | 1 / 4 | No multi-agent delegation, background tasks, event hooks, or scheduling primitives in the architecture [7]. |
| Inter-Agent | 0 / 4 | No inter-agent communication protocol exists as the service operates as a standalone endpoint [7]. |
| Output Processing | 5 / 4 | CVE-2025-26210 demonstrates XSS via JavaScript execution in the chat rendering domain with CVSS 8.8 [1]. |
| Configuration | 1 / 4 | Cloud-hosted with vendor-managed infrastructure where a backend ClickHouse database was exposed without authentication [9]. |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. DeepSeek triggers two of the three conditions on its documented default configuration, avoiding the X-axis floor.
DeepSeek exhibits two of the three trifecta conditions in its documented default configuration:
- Untrusted input — Chat accepts prompts from any user and web search fetches untrusted content with demonstrated bypass rates [2].
- Sensitive data — The default configuration processes only user-provided text and files without accessing operator credential stores [6].
- External egress — Built-in web search issues outbound HTTP requests and the output rendering surface carries demonstrated XSS exploitation risk [1].
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. DeepSeek presents minimal blast radius because the default configuration lacks autonomous execution, persistent storage, or credential-handling capabilities.
All factors score 0 or 1 because DeepSeek lacks autonomous execution capabilities on its default configuration.
Each row scores one blast-radius factor on the 0 to 4 scale for DeepSeek's default deployment.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 0 / 4 | No user-code execution or shell environment exists in the default cloud deployment as function-calling delegates to developer backends [7]. |
| File system access | 0 / 4 | Cloud-hosted with no local file-system interaction as file uploads are processed in-context only [7]. |
| Network access | 1 / 4 | Web search sends HTTP requests to external URLs during a session but no outbound SSRF is documented [7]. |
| Credential access | 0 / 4 | The service does not access operator credentials or API keys in its default configuration despite the Wiz-reported backend exposure [5][6]. |
| Autonomous action | 0 / 4 | No autonomous actions exist as every interaction requires explicit user initiation without scheduling [7]. |
| Deployment access | 0 / 4 | No deployment capability exists with no CI/CD integration or infrastructure provisioning in the default service [7]. |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. DeepSeek provides minimal documented defense controls with demonstrated bypass rates undermining the few that exist.
Scores reflect documented controls only, with confidence tiers indicating whether evidence is confirmed or inferred.
Each row scores one defense component on the 0 to 3 scale with the confidence tier for DeepSeek.
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 1 / 3 | Content safety filter exists but independent testing demonstrated 58 to 100 percent jailbreak success rates [2][3]. |
| Execution Isolation | 1 / 3 | Cloud-hosted architecture provides implicit isolation but no explicit sandbox controls are documented [7]. |
| Action Controls | 1 / 3 | No autonomous actions to control as API function-calling requires developer implementation for execution [7]. |
| Output Guardrails | 1 / 3 | Content filter returns content_filter stop reason but CVE-2025-26210 demonstrates XSS in the rendering surface [1]. |
| Monitoring | 0 / 3 | No public documentation of audit logging or SIEM integration exists for the consumer service [6]. |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should layer external controls across all five defense components to compensate for weak vendor defaults.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Establish an acceptable-use policy prohibiting sensitive enterprise data from reaching the DeepSeek API and require all integrations to route through a corporate proxy with content-inspection capability.
- Configuration Deploy a third-party prompt-injection detection layer such as Lakera Guard or Azure AI Content Safety as a pre-filter before forwarding user input to the DeepSeek API endpoint.
- Engineering Implement input-length caps, language-detection filtering, and structured-prompt templates that constrain freeform input to reduce the multilingual jailbreak attack surface.
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Require all API function-calling integrations to run tool execution inside a sandboxed environment with no access to production credentials or persistent file systems.
- Configuration Use container-based isolation with gVisor or Firecracker for any tool-execution backend processing DeepSeek function-call outputs, limiting network egress to an allowlist.
- Engineering Validate and sanitize all function-call arguments generated by the model before passing them to tool implementations, treating model output as untrusted input at the boundary.
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Define a human-approval gate for any integration that translates DeepSeek API responses into write operations including database mutations, email sends, or financial transactions.
- Configuration Implement rate limiting and scope restriction on any OAuth tokens or API keys accessible to systems processing DeepSeek outputs following least-privilege principles.
- Engineering Build an explicit action-allowlist mapping permitted function-call names to validated parameter schemas and rejecting any model-generated call outside the allowlist.
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Prohibit rendering DeepSeek output as executable HTML in any user-facing application and require all integration points to sanitize model output before display.
- Configuration Deploy a DLP scanner on the output path that detects and redacts credentials, PII, and internal URLs before responses reach end users or downstream systems.
- Engineering Apply strict Content-Security-Policy headers and HTML entity encoding on all surfaces rendering DeepSeek responses, mitigating the XSS vector demonstrated in CVE-2025-26210.
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Require logging of all API interactions to an immutable audit store with a minimum 90-day retention for security investigation and incident response.
- Configuration Forward API interaction logs to a SIEM platform with alerting rules for anomalous patterns such as repeated jailbreak attempts, unusual token volumes, or function-call spikes.
- Engineering Instrument the integration layer with structured telemetry capturing prompt classification scores, content-filter trigger rates, and function-call frequency for anomaly detection.
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- CVE-2025-26210 XSS in DeepSeek R1 through V3.1 allowing JavaScript execution in the chat rendering domain. CVSS 8.8 HIGH. Patch status unknown.
Selected Research
- DeepSeek Jailbreak Vulnerability Analysis Qualys TotalAI tested DeepSeek R1 against 885 jailbreak attacks across 18 techniques and found a 58 percent failure rate.
- Evaluating Security Risk in DeepSeek Cisco security evaluation achieved 100 percent attack success rate against DeepSeek R1 on 50 HarmBench prompts.
- Exposing the Security Risks of DeepSeek-R1 HiddenLayer demonstrated prompt injection, system prompt leakage, XSS generation, PII leakage, and DoS against DeepSeek R1.
- Wiz Research Uncovers Exposed DeepSeek Database Wiz discovered an unauthenticated ClickHouse database at DeepSeek exposing over one million chat logs and API secrets.
Vendor Documentation
- DeepSeek Privacy Policy The vendor privacy policy documents all personal data stored on servers in China with commercially reasonable security measures.
- DeepSeek API Documentation Official API docs describe the OpenAI-compatible endpoint with function calling support and content_filter stop reason.
Other Sources
- Language-dependent safety filter bypass GitHub issue demonstrating Russian-language system prompt completely removes safety restrictions with 100 percent reproducibility.
- DeepSeek Database Exposure Coverage InfoQ coverage of the Wiz ClickHouse incident where unauthenticated databases exposed chat history and API secrets.
- DeepSeek Terms of Use Vendor terms document that user inputs may be used to improve the model unless users opt out.