Gemini Agent Security Risks

General Assistant Agents gemini.google.com Fortified Leaders
AI RISK QUADRANT POSITION DEFENSE CONTROLS (9) ATTACK SURFACE (5.52) EXPOSED GIANTS FORTIFIED LEADERS HUMBLE PROVIDERS TIGHT OPERATORS
AIRQ Score
4.17
High
Attack Surface
5.52
High
Blast Radius
3.63
Medium
Defense Controls
9
Medium
About The Agent

Gemini is a multimodal AI assistant integrated across Google Workspace, Android, and the web that processes user email, calendar, documents, and files through extensions while executing actions on behalf of the authenticated user within the Google ecosystem.

About the AI Risk Quadrant

Fortified Leaders agents combine a wide attack surface with moderate blast radius. Gemini's elevated input exposure from Workspace extension channels and demonstrated exfiltration paths place it in this quadrant despite meaningful vendor-deployed defenses.

1 Key Risks

The most critical security risks an operator inherits when deploying this agent in its documented default configuration. Gemini faces elevated input-channel risk from indirect prompt injection through Workspace extensions combined with broad read access to private data and an acknowledged unfixed exfiltration path through rendered hyperlinks, compounded by a documented supply chain compromise of the CLI development repository [3].

Key Input Risks
Indirect prompt injection via Workspace extensions processes attacker-authored calendar invitations, email bodies, and shared documents without content isolation, enabling command execution within the user's authenticated session [4][9].
Key Execution Risks
Persistent memory poisoning through injected instructions in processed content enables long-term manipulation of assistant responses, with no sandboxed execution boundary isolating Gemini processing from user data access [4][12].
Key Action Risks
Read access to Gmail, Drive, Calendar, and Keep proceeds without per-action confirmation, exposing private communications and files to exfiltration through demonstrated multi-layer trust boundary bypasses [7][12]; the CLI variant carried a workspace trust vulnerability enabling unauthenticated remote code execution [1][14].
Key Output Risks
One-click data exfiltration through rendered markdown hyperlinks remains an acknowledged unfixed path; the zero-click image variant is blocked but hyperlink-based leakage persists by design [6][11].
Key Monitoring Risks
Enterprise Workspace deployments provide per-interaction audit trails through Admin Console, but consumer accounts lack equivalent visibility and no real-time alerting exists for anomalous data access patterns within a session [12].

2 AIRQ Scores

The four headline scores quantify how exposed the agent is, how damaging a successful attack would be, and how much the agent’s own controls reduce that risk. Gemini scores below the median resilience threshold due to the breadth of untrusted input channels and demonstrated data exfiltration paths offsetting the vendor's investment in content classifiers and adversarial model hardening.

AIRQ Metrics

The Fortified Leaders placement reflects an agent with more attack surface exposure than its blast radius alone would suggest. Gemini's input channels span arbitrary third-party content ingested through Workspace extensions, while the blast radius is bounded by the absence of arbitrary code execution and the constraint to Google-controlled APIs. The attack surface score sits 0.52 above the quadrant boundary, placing this agent in proximity to a reclassification threshold.

Scores are computed from per-component assessments documented in the sections below. Attack Surface aggregates ten input channels; Blast Radius aggregates six capability factors; Defense Controls aggregates five protection categories.

Metric Score Comments
AIRQ Score 4.17 Resilience below median threshold driven by the breadth of untrusted input channels processing third-party content with demonstrated bypass paths.
Blast Radius 3.63 / 10 Workspace data access across email, files, and calendar drives the blast radius, constrained by absence of arbitrary code execution.
Attack Surface 5.52 / 10 Four surfaces at or near ceiling driven by demonstrated indirect prompt injection and markdown exfiltration research [4][6].
Defense Controls 9 / 15 Content classifiers and adversarial fine-tuning deployed [5][10]; output guardrails carry the primary gap via unfixed hyperlink exfiltration [11].

3 Attack Surface

Attack surfaces are the entry points and interaction patterns through which adversarial input can reach the agent’s reasoning loop and steer its behavior. Gemini ingests untrusted content through Workspace extensions including email, calendar invitations, and shared documents [16]. Four surfaces reach adjusted ceiling or near-ceiling scores from demonstrated indirect prompt injection and exfiltration research.

Attack Surface Metrics

Scores range from 0 (no exposure) to 4 (adjusted ceiling with agent-specific evidence of exploitation). Evidence penalties apply only where agent-specific research or vendor acknowledgment documents the attack path.

Each surface is scored against observable conditions documented in the research dossier. Base scores reflect architectural exposure; evidence penalties reflect agent-specific exploitation demonstrations.

Surface Score Comments
User Input 4 / 4 Indirect prompt injection through calendar invitations and email bodies reaches the authenticated session; vendor defense strategy acknowledges residual bypass rate after adversarial fine-tuning [4][5][9].
External Data 4 / 4 Workspace extensions ingest third-party documents and web content through trust boundaries that have been bypassed via multi-layer attack chains and image scaling manipulation [7][8][11].
Memory 3 / 4 Persistent memory poisoning enables long-term manipulation through injected instructions in processed content; the vendor's layered defense acknowledges this vector [4][9].
Reasoning 2 / 4 Standard transformer reasoning with adversarial fine-tuning applied [5]; no agent-specific reasoning chain manipulation distinct from prompt injection has been documented independently.
Planning 2 / 4 No multi-step planning autonomy beyond single-turn action proposals; the confirmation framework for write operations bounds the planning attack surface [9].
Tool Execution 2 / 4 Extensions execute within pre-defined Google API capabilities [12]; no arbitrary tool invocation or dynamic tool registration is available to end users by default.
Orchestration 2 / 4 Single-model orchestration without chained sub-agent delegation; Workspace extensions are pre-configured rather than dynamically composed at runtime [12].
Inter-Agent 2 / 4 No default inter-agent communication protocol in the consumer assistant [12]; the product operates as a single-model system without delegation to third-party agents.
Output Processing 4 / 4 Markdown rendering enables one-click data exfiltration via hyperlinks; the vendor acknowledged the hyperlink variant remains unfixed while the zero-click image variant is sanitized [6][7][11].
Configuration 2 / 4 Workspace admin controls govern extension availability and data access policies for enterprise deployments [12]; the consumer configuration surface exposes no additional security-relevant toggles.

The Lethal Trifecta is triggered when an agent processes untrusted content, accesses private data, and communicates externally in the same session — the three conditions that turn an isolated prompt injection into full-chain exfiltration. Gemini ingests arbitrary third-party content through Workspace extensions, reads private email and files on behalf of the user, and can transmit data externally through rendered hyperlinks and drafted messages.

Lethal Trifecta · Complete (3 of 3)

Gemini exhibits all three of these conditions in its documented default configuration:

  • Untrusted input — Workspace extensions process email bodies, calendar invitations, and shared documents authored by arbitrary third parties without content isolation between trusted and untrusted input [4][9].
  • Sensitive data — The assistant reads Gmail messages, Drive files, Calendar events, Keep notes, and Tasks entries through extension access covering private and privileged user data [12].
  • External egress — Default output channels include drafted emails, calendar events with external attendees, rendered hyperlinks in responses, and Colab notebooks with outbound network access [6][7][11].

4 Blast Radius

The blast radius is what an attacker who controls the agent can reach — which systems they touch, which credentials they read, and which actions they take without operator approval. Gemini's blast radius is concentrated in Workspace data access spanning email messages, document files, calendar entries, and task lists, with no arbitrary code execution and no deployment infrastructure capabilities. The scope covers both personal and professional data held in the authenticated account but is bounded by Google-controlled APIs.

Blast Radius Metrics

Scores range from 0 (no capability) to 4 (unrestricted capability in default configuration). Each factor measures the worst-case impact of a compromised session.

Factors measure the extent of damage achievable through a compromised Gemini session operating within its default permission envelope.

Factor Score Comments
Code execution 0 / 4 No arbitrary code execution capability in the consumer assistant [1]; Colab integration exists but requires explicit user navigation away from the conversation interface.
File system access 3 / 4 Read access spans Gmail, Drive, Calendar, Keep, and Tasks through Workspace extensions, representing broad exposure of personal and professional data within the authenticated account [2][12].
Network access 2 / 4 Outbound communication is limited to Google-controlled APIs and rendered hyperlinks [11]; no raw socket or arbitrary HTTP request capability exists in the default configuration.
Credential access 2 / 4 Operates within the user's existing OAuth session scope [12]; no credential storage, vault access, or cross-service token generation beyond the authenticated Google account boundary.
Autonomous action 2 / 4 Can draft emails, create calendar events, and modify Keep notes [9]; write operations require user confirmation for send actions but read operations proceed without approval.
Deployment access 0 / 4 No infrastructure provisioning, deployment pipeline, or cloud resource management capabilities exist in the consumer assistant configuration [12].

5 Defense Controls

Defense controls are what the agent’s own architecture does to detect, contain, and report attacks before they reach the operator’s systems. Gemini deploys content classifiers, adversarial model hardening, and a markdown sanitizer by default, but output guardrails carry a documented gap in hyperlink-based exfiltration that the vendor has acknowledged as unfixable within current architectural constraints.

Defense Controls Metrics

Scores use an inverted scale: 3 indicates strong default protection, 0 indicates no protection in place. Confidence markers indicate the strength of evidence supporting the score.

Each component is scored against the default configuration. Non-default hardening options are documented in the hardening tips section rather than reflected in the baseline score.

Component Score Comments
Input Guardrails 2 / 3 Content classifiers, security thought reinforcement, and adversarial fine-tuning reduce attack success rates by 47% on average while targeted attacks still achieve 94.6% bypass in specific scenarios [5][9][10][13].
Execution Isolation 2 / 3 Processing occurs within Google infrastructure with request-level isolation between extension API calls and per-session boundaries preventing cross-user data access [12]; no user-configurable sandbox separates data access from action execution within a single session.
Action Controls 2 / 3 User confirmation framework gates email sending and calendar event creation; read operations across all Workspace data proceed without per-action approval [9][12].
Output Guardrails 1 / 3 Markdown sanitizer blocks zero-click image-based exfiltration but one-click hyperlink exfiltration remains unfixed; URL provenance provides partial protection for the zero-click path only [6][11].
Monitoring 2 / 3 Enterprise Workspace audit logs record Gemini interactions for admin review; individual user accounts have no per-session audit trail and no anomaly detection alerts for suspicious access patterns [12][15].

6 Hardening Tips

Concrete actions an operator can take to reduce the risks reported above, grouped by which defense control each tip strengthens. Operators can reduce residual risk through Workspace admin policy configuration, extension access scoping, and integration with existing security monitoring infrastructure.

Input Guardrails

Input guardrails intercept adversarial content before it reaches the reasoning loop.

Input Guardrails
  • Policy Restrict Gemini extension access to trusted organizational units only and disable extensions for shared mailboxes where indirect injection risk from external senders is unacceptable.
  • Configuration Configure Workspace DLP rules to quarantine or flag documents matching known injection patterns before Gemini processes them through the existing content compliance infrastructure.
  • Engineering Deploy pre-processing rules at the organizational boundary that strip or sanitize suspicious instruction patterns in inbound email bodies and shared document content before Gemini ingestion.

Execution Isolation

Execution isolation contains what a compromised agent can do on the host.

Execution Isolation
  • Policy Limit Gemini availability to specific organizational units by configuring Workspace admin policies that restrict which users and groups can invoke Gemini features on sensitive data stores.
  • Configuration Disable Gemini extensions for service accounts and high-privilege administrative accounts where the blast radius of a compromised session extends beyond individual user data.
  • Engineering Segment data access by enabling Gemini for non-sensitive organizational units initially and expanding access incrementally after observing access patterns in enterprise audit logs.

Action Controls

Action controls govern which tools and actions the agent can invoke autonomously.

Action Controls
  • Policy Enable the strictest confirmation mode for all Gemini write operations including email drafts, calendar modifications, and Keep note creation to maximize human oversight of outbound actions.
  • Configuration Implement organizational policies requiring elevated authentication before Gemini can execute actions affecting external recipients or modifying shared resources.
  • Engineering Configure conditional access policies that require step-up verification when Gemini-initiated communications target domains outside the organization's trusted contact list.

Output Guardrails

Output guardrails inspect what the agent sends to other systems and users.

Output Guardrails
  • Policy Disable Colab integration for users who do not require code execution capabilities, eliminating the demonstrated markdown sanitizer bypass path through notebook export [7].
  • Configuration Configure browser-level content security policies for Workspace domains to restrict outbound image loading and link navigation patterns from rendered Gemini responses.
  • Engineering Monitor click-through rates on Gemini-generated hyperlinks using existing web proxy logs to detect anomalous exfiltration patterns through the acknowledged one-click path [11].

Monitoring

Monitoring captures what the agent did and surfaces anomalies for review.

Monitoring
  • Policy Enable Gemini-specific audit log alerting in Workspace Admin Console with filters for bulk data access patterns and unusual extension invocation sequences across user accounts.
  • Configuration Integrate Gemini interaction logs with existing SIEM infrastructure to correlate assistant activity with identity signals and detect anomalous data retrieval patterns.
  • Engineering Implement periodic access reviews auditing which documents, emails, and calendar entries were accessed through the assistant within each billing period to detect unauthorized data exposure.

7 References

The evidence base behind every score and finding in the profile, grouped by source type so the reader can verify any claim. Numbers in brackets throughout the report (e.g. [7, 13]) refer to entries below, listed in citation order.

Selected Vulnerabilities

  1. GHSA-wpqr-6v78-jr5g Workspace trust and tool allowlisting bypass enabling unauthenticated RCE in Gemini CLI headless and CI/CD environments (CVSS 10.0). Patched in @google/gemini-cli 0.39.1 and run-gemini-cli Action 0.1.22.
  2. GHSA-wpqr-6v78-jr5g (OSV mirror) OSV cross-database record confirming the Gemini CLI RCE advisory scope and patch availability across the npm ecosystem.
  3. Pillar Security TrustIssues Supply chain compromise of gemini-cli repo via AI-powered GitHub workflow prompt injection exposing API keys and OIDC credentials.

Selected Research

  1. Invitation Is All You Need 14 indirect prompt injection scenarios against Gemini Workspace assistants demonstrated at Black Hat USA 2025.
  2. Lessons from Defending Gemini Google DeepMind adversarial evaluation documenting 47% ASR reduction for Gemini 2.5 while TAP still achieves 94.6% in Calendar scenario.
  3. Exploiting Markdown Injection in AI Agents Markdown injection for one-click data exfiltration in Gemini; Google acknowledged fix infeasible.
  4. Hacking Gemini: A Multi-Layered Approach Multi-layer markdown sanitizer bypass via Colab export path demonstrated at bugSWAT Tokyo 2025.
  5. Weaponizing Image Scaling Against Production AI Image scaling reveals hidden prompt injections in Gemini CLI and web interface enabling exfiltration via MCP.

Vendor Documentation

  1. Google Layered Defense Against Indirect Prompt Injection Google Workspace documentation on layered defenses including content classifiers and markdown sanitization.
  2. Mitigating Prompt Injection Attacks Google security blog on multi-layer prompt injection mitigation including Gemini 2.5 model hardening.
  3. Mitigating URL-based Exfiltration in Gemini Google Bug Hunters on markdown sanitizer design and URL provenance blocking zero-click exfiltration.
  4. Generative AI in Google Workspace Privacy Hub Workspace privacy documentation on data handling, retention, and audit logs for Gemini interactions.
  5. Gemini 3.1 Pro Model Card Model card with safety evaluations, red teaming results, and Frontier Safety Framework assessments.

Other Sources

  1. Google Fixes CVSS 10.0 Vulnerability in Gemini CLI The Register on critical Gemini CLI workspace trust vulnerability and emergency patch.
  2. Google Cloud ISO/IEC 42001 Certification Google Cloud confirming ISO/IEC 42001:2023 certification for AI management system.
  3. Prompt Injection in Google Gemini Apps The Register on Black Hat USA 2025 disclosure of practical indirect prompt injection against Gemini.