Mistral Le Chat Agent Security Risks

General Assistant Agents mistral.ai Exposed Giants
AI RISK QUADRANT POSITION DEFENSE CONTROLS (4) ATTACK SURFACE (5.64) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.54
High
Attack Surface
5.64
High
Blast Radius
5.87
High
Defense Controls
4
High
About The Agent

Mistral Le Chat is a cloud-hosted general-purpose AI assistant developed by a Paris-based vendor, available through web, mobile, and API interfaces. The agent provides web search, sandboxed code execution, canvas collaboration, and agentic multi-step task execution through Work mode with MCP connectors bridging to more than a dozen enterprise platforms. The primary risk surface centers on demonstrated prompt injection and data exfiltration vulnerabilities: independent research achieved successful PII exfiltration on the production service, and the output layer carries no documented DLP or exfiltration channel blocking.

About the AI Risk Quadrant

Exposed Giants agents combine a moderate-to-elevated attack surface with weak default defense controls, meaning adversarial input has a viable path to exfiltrate data or trigger unintended actions before any guardrail intervenes. Mistral Le Chat fits this pattern: demonstrated exploitation research pushes several attack surfaces above base band while defense controls offer only partial input moderation, with no output guardrails or monitoring on the default configuration. Operators evaluating deployment should treat output-layer hardening and monitoring instrumentation as prerequisites, not optional add-ons.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Demonstrated prompt injection and data exfiltration research against the production service combines with absent output filtering and no documented monitoring to create an input-to-egress attack chain that operators cannot observe.

Key Input Risks
Untrusted content from numerous channels including web results, file uploads, and enterprise connector payloads reaches the model without prompt injection filtering on the default configuration. Independent research demonstrated PII exfiltration at an 80% success rate [3]; operators should enable maximum-sensitivity moderation and block external-URL image rendering.
Key Execution Risks
Code Interpreter runs Python in a sandboxed environment with no internet access, but Work mode adds a bash sandbox and MCP tool execution without publicly documented red-team testing. Research demonstrated tool misuse attacks causing the agent to exfiltrate data through its web access capabilities.
Key Action Risks
Work mode gates sensitive connector operations behind approval prompts, but a progressive allowlist permanently exempts functions without expiration or re-authentication once the operator marks them trusted. Operators should audit the allowlist quarterly, revoke blanket exemptions, and restrict connector scope to the minimum required OAuth permissions.
Key Output Risks
Rich markdown output including image rendering, Canvas content, and MCP write functions provides multiple channels without documented output DLP, credential redaction, or URL sanitization. Research demonstrated data exfiltration through both markdown image rendering and email output channels on the production service.
Key Monitoring Risks
No public documentation describes structured audit logging, anomaly detection, or SIEM forwarding for agent conversations, leaving operators without visibility into tool invocations or abuse patterns. Data retention policies mention abuse-monitoring retention windows but offer no operator-facing observability or alerting tooling.

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Mistral Le Chat shows a moderate composite driven by elevated input and output surfaces offset partially by sandboxed code execution and approval-gated MCP write operations.

AIRQ Metrics

The agent places in the Exposed Giants quadrant with attack surface above the midpoint, blast radius below the high-capability threshold, and defense controls covering less than a third of the available score range.

Each axis uses a different denominator: attack surface is measured out of ten, blast radius out of ten, defense controls out of fifteen, and the composite AIRQ score out of fifteen.

Metric Score Comments
AIRQ Score 4.54 The moderate composite reflects partial defense coverage that narrows the gap between capability exposure and available safeguards, leaving meaningful hardening headroom for operators.
Blast Radius 5.87 / 10 OAuth-scoped MCP connector tokens for enterprise services and outbound network access through web search and connector write functions drive the capability factors.
Attack Surface 5.64 / 10 Demonstrated exploitation on user input, tool execution, and output processing surfaces drives the attack score above midpoint with all three exfiltration preconditions met.
Defense Controls 4 / 15 Input moderation and sandboxed execution provide partial coverage while output guardrails and monitoring have nothing documented on the default configuration.

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The agent ingests content from web search, uploaded documents, and MCP connectors while rendering rich markdown output and executing tools across a bash sandbox and enterprise service connectors.

Attack Surface Metrics

Higher scores indicate broader exposure to adversarial input and weaker boundary controls, with surfaces at adjusted ceiling carrying demonstrated exploitation evidence on the production service.

Each row names a distinct entry point into the agent's reasoning loop, its scored exposure level, and the evidence anchoring the assessment.

Surface Score Comments
User Input 4 / 4 Web, mobile, API, and file upload channels accept input with content-safety moderation but no dedicated prompt injection shield; adversarial prompts achieved successful PII exfiltration on the production service. [3][5]
External Data 3 / 4 MCP connectors ingest data from enterprise platforms including email, code repositories, and project management tools; web search fetches content from untrusted URLs without documented content validation. [8]
Memory 2 / 4 Custom instructions persist across sessions as manual-only writes; the opt-in Memories feature adds automated cross-session saving but remains disabled on the default configuration. [9]
Reasoning 2 / 4 Multi-step reasoning operates within conversation scope with model selection but no documented chain-of-thought audit or reasoning transparency for the operator. [4][5]
Planning 2 / 4 Work mode decomposes tasks into tool calls with user-visible execution steps and no plan-level approval gate; planning scope stays within the active conversation. [5][6]
Tool Execution 4 / 4 Code Interpreter, bash sandbox, web search, and MCP write functions compose a multi-tool surface; research demonstrated tool misuse for data exfiltration on the production service. [3][6]
Orchestration 2 / 4 Multi-step task execution runs within user-supervised sessions with parallel tool calls; no background processes, scheduling, or headless daemon operation is documented. [5][6]
Inter-Agent 2 / 4 MCP connectors bridge the agent to external services including custom server URLs with no documented inter-agent message integrity verification or authentication protocol. [8]
Output Processing 4 / 4 Rich markdown with image rendering and MCP write functions carry no output DLP; research demonstrated exfiltration via markdown images and email-based output encoding on the production service. [1][3]
Configuration 2 / 4 Custom instructions and connector settings are managed through the settings UI with explicit user action; a vendor SDK supply chain incident and custom MCP server URLs highlight supply chain vetting gaps. [2][5]

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Le Chat ingests adversary-authored content through web search and MCP connectors, holds OAuth-scoped tokens for enterprise services, and renders rich markdown output that has been demonstrated as an exfiltration channel.

Lethal Trifecta · Complete (3 of 3)

Mistral Le Chat exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Content fetched via web search, user-uploaded files, and enterprise connector payloads from external platforms carry adversary-authored bytes into the reasoning context. [5][8]
  • Sensitive data — OAuth-scoped MCP connectors access private enterprise data from code repositories, email inboxes, project boards, and payment platforms on behalf of the operator. [8]
  • External egress — Markdown image rendering, web search outbound requests, and MCP connector write functions that send email and post to collaboration platforms provide default egress channels. [1][3]

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A successful compromise reaches sandboxed code execution, scoped file access, OAuth-scoped enterprise credentials, and outbound network connectivity through MCP connectors and web search.

Blast Radius Metrics

Higher blast scores indicate broader reach from a compromised agent session, with network and credential factors elevated by the MCP connector framework's OAuth integration scope.

Each row ties a scored capability factor to the specific workflow node, OAuth scope, or sandbox boundary that determines how far a compromised session can reach.

Factor Score Comments
Code execution 2 / 4 Sandboxed Python Code Interpreter has no internet access; Work mode adds a scoped bash sandbox for multi-step agentic tool execution within the operator's session. [5][6]
File system access 2 / 4 File access is scoped to uploaded documents within the current conversation; no access to the operator's local file system or home directory is documented. [5]
Network access 3 / 4 MCP connectors make outbound calls to enterprise APIs; web search fetches content from the open web; Code Interpreter has no network access by design. [6][8]
Credential access 3 / 4 OAuth tokens for enterprise platforms including code repositories, email, project boards, and payment processors are held by the MCP connector framework with per-function authorization. [8]
Autonomous action 2 / 4 Work mode selects tools autonomously and gates write operations behind approval prompts; the progressive allowlist can permanently exempt individual functions from future approval. [6]
Deployment access 2 / 4 Vibe Code Workflow creates draft pull requests on connected code repositories but has no direct infrastructure deployment or package publishing capability documented. [5]

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Default controls provide ML-based input moderation and sandboxed code execution while leaving output filtering, exfiltration prevention, and operator-facing monitoring entirely undocumented.

Defense Controls Metrics

Higher scores indicate stronger vendor-implemented safeguards on the default configuration, with the inverted coloring reflecting the gap between available protection and maximum coverage.

Each component is scored against what the vendor documents for the default configuration, with confidence reflecting the evidence tier for the control's existence.

Component Score Comments
Input Guardrails 1 / 3 ML-based moderation classifier covers jailbreaking detection among nine policy categories, but no dedicated prompt injection shield or instruction hierarchy separation is documented on the default configuration. [7]
Execution Isolation 2 / 3 Code Interpreter runs in a vendor-documented sandboxed environment with no internet access; specific container technology and escape testing results are not publicly available. [5][10]
Action Controls 1 / 3 Approval gates require operator confirmation for MCP connector write operations, but the session-persistent progressive allowlist permanently exempts approved functions without automatic expiry or re-authentication, meeting the downgrade criteria. [6]
Output Guardrails 0 / 3 No output-specific DLP, exfiltration blocking, credential redaction, or URL sanitization is documented; research demonstrated data exfiltration through markdown rendering and email output. [1][3]
Monitoring 0 / 3 The vendor privacy policy references data retention for abuse monitoring but no operator-facing audit logging, SIEM forwarding, or anomaly detection is publicly documented. [10][11]

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize output-layer exfiltration controls, monitoring instrumentation, and tightening the progressive approval allowlist to close the demonstrated input-to-egress attack chain.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Require all MCP connector inputs to pass through a secondary prompt injection classifier before reaching the model reasoning context.
  • Configuration Configure the moderation API thresholds to maximum sensitivity for the jailbreaking category across all enterprise deployments.
  • Engineering Deploy a dedicated prompt shield or instruction hierarchy separation layer between system prompts and user-provided or tool-retrieved content.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Restrict Work mode bash sandbox capabilities to a curated command allowlist and disable shell access for non-administrative user roles.
  • Configuration Set per-session file upload limits and enforce automatic workspace purge after each Code Interpreter run to prevent data accumulation across tasks.
  • Engineering Implement container escape detection and runtime security monitoring for all sandbox execution environments used by the agent.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Require periodic re-authentication for MCP connector write operations and disable the progressive allowlist feature that permanently bypasses approval.
  • Configuration Configure per-connector scope restrictions to limit which MCP functions are available to each organizational role.
  • Engineering Build a deny-by-default policy engine that requires explicit operator approval for each new MCP connector function category.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Require all agent output to pass through a DLP classifier that blocks credential patterns, PII, and encoded data before rendering.
  • Configuration Disable markdown image rendering from external URLs and block Base64-encoded content in all agent output channels.
  • Engineering Deploy an output sanitization layer that strips or rewrites URLs in markdown images and hyperlinks to prevent rendering-based exfiltration.

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Mandate weekly review of agent conversation logs for prompt injection indicators, abnormal connector call volumes, and unauthorized data-access patterns, with SIEM forwarding and 90-day retention.
  • Configuration Enable conversation-level monitoring dashboards that surface anomalous patterns such as repeated tool failures or unusual data access volumes.
  • Engineering Implement behavioral anomaly detection that alerts on prompt injection signatures, exfiltration patterns, and privilege escalation attempts across agent sessions.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. Le Chat data exfiltration via email 0din.ai disclosure — indirect prompt injection via MCP email on production Le Chat (2026-03-05)
  2. Mistral SDK supply chain (GHSA-wx9m-wx4f-4cmg) CVSS 9.6 — malicious dropper in mistralai PyPI 2.4.6 via TanStack supply chain (2026-05)

Selected Research

  1. Imprompter — Tricking LLM Agents into Improper Tool Use PII exfiltration on production Mistral LeChat with 80% success rate (2024)
  2. ChatInject — Chat Template Abuse for Prompt Injection Class-level — 32-52% attack success rate on frontier LLMs including Mistral models (2025)

Vendor Documentation

  1. Le Chat product overview Vendor docs — Le Chat tools and modes
  2. Work mode documentation Multi-step agent harness with bash sandbox and approval gates
  3. Moderation and Guardrailing ML-based classifier with 9 policy categories including jailbreaking
  4. MCP Connectors documentation 20+ connectors plus custom MCP server support
  5. Memories documentation Opt-in cross-session persistent memory with graph-based architecture

Other Sources

  1. Mistral AI Trust Center Vendor trust and security resources hub
  2. Mistral AI Privacy Policy Data retention and training opt-out policies