Pickle in the Middle: Critical RCE Flaw in Google Ve...

Executive Summary

Palo Alto Networks' Unit 42 discovered a critical vulnerability in the Google Cloud Vertex AI Python SDK (google-cloud-aiplatform) that permitted cross-tenant remote code execution (RCE). The vulnerability, present in SDK versions 1.139.0 and 1.140.0, allowed an attacker with no initial access to a victim's environment to hijack and poison machine learning (ML) models during the upload process. The core of the issue was a predictable default naming pattern for staging buckets combined with a lack of ownership verification in the SDK.

An attacker could exploit this by predicting and pre-creating a Google Cloud Storage (GCS) bucket (a technique known as 'bucket squatting'). When a victim using a vulnerable SDK version uploaded a model without specifying a custom staging location, the SDK would inadvertently send the model artifacts to the attacker-controlled bucket. The attacker could then replace the legitimate model with a malicious one, leveraging Python's pickle deserialization to achieve RCE when the victim deployed the compromised model. Google has addressed this vulnerability in SDK version 1.148.0 following responsible disclosure from Unit 42.

Vulnerability Details

The vulnerability, dubbed 'Pickle in the Middle,' is a multi-stage attack that hinges on three key components:

Predictable GCS Bucket Name: When a user uploads a model via the VertexAI.Model.upload() function without specifying a staging_bucket, the SDK generates a default bucket name using a deterministic pattern: [project-id]-[region]-vertex-ai-staging. An attacker only needs the victim's project ID and region to predict this name.
Bucket Squatting and Missing Ownership Check: GCS bucket names are globally unique. An attacker can preemptively create a bucket with the predicted name in their own Google Cloud project. The vulnerable SDK versions checked for the bucket's existence but failed to verify that it belonged to the user's project. Consequently, the SDK would proceed to upload the victim's model artifacts to the attacker's bucket, assuming it was a legitimate staging area.
Malicious Model Replacement and Deserialization RCE: The attacker can then, within a narrow time window, replace the victim's uploaded model files (e.g., model.joblib) with a malicious version. Since many Python ML models are serialized using pickle or its wrapper joblib, the attacker can craft a malicious model file. This file, when deserialized by the Vertex AI serving infrastructure using pickle.load() or joblib.load(), executes arbitrary code via the __reduce__ method. This provides the RCE payload delivery mechanism.

The attack flow is as follows:

Attacker identifies a target's Google Cloud project ID.
Attacker creates a GCS bucket in their own project named [victim-project-id]-[region]-vertex-ai-staging.
Victim, using a vulnerable SDK, uploads an ML model without specifying a staging bucket.
The SDK silently uploads the model artifacts to the attacker's bucket.
Attacker replaces the legitimate model file with a malicious pickled object designed for RCE.
Victim deploys the now-compromised model to a Vertex AI endpoint.
The Vertex AI service loads the malicious model, deserializes the pickle file, and executes the attacker's code within the victim's serving infrastructure.

Affected Systems

Product: Google Cloud Vertex AI Python SDK (google-cloud-aiplatform)
Vulnerable Versions: 1.139.0 and 1.140.0
Patched Version: 1.148.0 and later

Organizations using Vertex AI for MLOps pipelines are urged to check their dependencies and ensure they are not running the affected versions.

Exploitation Status

The vulnerability was discovered and demonstrated through a proof-of-concept by Unit 42 researchers. There is no public evidence of this specific technique being exploited in the wild. However, 'bucket squatting' is a known attack class, and with the public disclosure, threat actors may attempt to find and exploit unpatched systems.

Impact Assessment

A successful exploit of this vulnerability has severe security implications, leading to a complete compromise of the targeted ML model's serving environment. The business impact includes:

Cross-Tenant Remote Code Execution: The primary impact is gaining a foothold within the victim's cloud infrastructure, running with the permissions of the Vertex AI service account. This bypasses tenant isolation, a fundamental security promise of cloud platforms.
Data Exfiltration: Once RCE is achieved, the attacker can use the service account's permissions to access and exfiltrate sensitive data from GCS buckets, BigQuery datasets, or other resources accessible to the Vertex AI service.
Lateral Movement: The attacker can leverage the compromised environment as a pivot point to move laterally within the victim's Google Cloud project and potentially across the broader network.
Model Poisoning and Inference Hijacking: The attacker can manipulate the model to produce incorrect outputs, sabotage business processes relying on AI, or steal inference data sent to the model endpoint.
Supply Chain Risk: This vulnerability introduces a critical supply chain risk into the MLOps lifecycle, compromising models before they are even deployed.

IOCs — Directly from Articles

No specific Indicators of Compromise (IOCs) were provided in the source article, as it details a vulnerability rather than a specific active campaign.

Cyber Observables — Hunting Hints

Security teams may want to hunt for the following patterns to identify potential misuse or vulnerable configurations:

Type

Log Source

Value

Google Cloud Audit Logs

Description

Monitor for storage.buckets.create and storage.objects.create events.

Type

Log Pattern

Value

principalEmail does not match project

Description

Alert when a GCS bucket is accessed or written to by a service account or user from a different project, especially for buckets with predictable names.

Type

GCS Bucket Name

Value

*-vertex-ai-staging

Description

Proactively search for public or externally-owned buckets in your organization that match the default Vertex AI staging pattern.

Type

Network Traffic

Value

Egress from Vertex AI Endpoints

Description

Monitor for unexpected outbound network connections from Vertex AI serving containers to unknown IP addresses or domains.

Detection & Response

Detecting and responding to this threat requires a focus on both preventative and detective controls.

Detection Strategies:

Dependency Scanning: Regularly scan Python environments and requirements.txt files to identify vulnerable versions of the google-cloud-aiplatform SDK (1.139.0, 1.140.0).
Cloud Audit Logging: In Google Cloud Logging, create alerts based on queries that detect cross-project access to GCS buckets. Look for storage.objects.create events where the principalEmail (the actor) belongs to a different project than the bucket's parent project.
GCS Bucket Inventory: Regularly audit GCS buckets to ensure no buckets with the pattern [your-project-id]-[region]-vertex-ai-staging are owned by an external project.
D3FEND Techniques: Implement defensive measures like D3-NTA: Network Traffic Analysis to monitor for anomalous egress from Vertex AI serving environments and D3-FA: File Analysis on model artifacts before deployment.

Mitigation

Immediate Actions:

Patch Immediately: The most critical mitigation is to upgrade the google-cloud-aiplatform SDK to version 1.148.0 or newer. This can be done by running: pip install --upgrade google-cloud-aiplatform.
Explicitly Define Staging Buckets: As a defense-in-depth measure, always specify a known, company-owned GCS bucket when uploading models. This overrides the vulnerable default behavior. Example:
```
from google.cloud import aiplatform

aiplatform.init(project='your-project', staging_bucket='gs://your-secure-staging-bucket')
# ... model upload code
```

Strategic Recommendations:

Principle of Least Privilege: Ensure the Vertex AI service account (service-[PROJECT_NUMBER]@gcp-sa-aiplatform.iam.gserviceaccount.com) has the minimum necessary permissions and does not have overly broad access to unrelated projects or data stores.
Secure MLOps Pipeline: Integrate security checks into your CI/CD pipeline for ML, including vulnerability scanning of dependencies, static analysis of code, and integrity checks of model artifacts.
D3FEND Hardening: Apply countermeasures such as D3-ACH: Application Configuration Hardening by enforcing the use of explicit staging buckets in all model upload scripts and CI/CD pipelines.

The primary and most effective countermeasure is to immediately update the google-cloud-aiplatform library across all development environments, CI/CD runners, and production systems. Organizations should implement a process to scan for vulnerable versions 1.139.0 and 1.140.0 and enforce an upgrade to version 1.148.0 or higher. This can be achieved by updating requirements.txt or pyproject.toml files and rebuilding dependent container images. CI/CD pipelines should include a step that fails the build if vulnerable versions of this library are detected. This directly applies the patch from Google, which introduces the necessary ownership verification for the staging bucket, completely neutralizing the 'bucket squatting' attack vector.

Pickle in the Middle: Unit 42 Discovers Cross-Tenant RCE in Google Vertex AI via Model Upload Hijacking