Critical remote code execution (RCE) vulnerabilities have been found in popular AI inference engines, affecting frameworks developed by Meta, NVIDIA, Microsoft, and several open-source projects. Research from Oligo Security identified systemic weaknesses related to insecure data deserialization via Python's pickle module and insecurely exposed ZeroMQ (ZMQ) messaging endpoints. Successful exploitation of these flaws could allow a remote attacker to execute arbitrary code on the AI server, leading to model theft, data poisoning, or a pivot into the broader corporate network. The discovery also highlights the issue of "Shadow Vulnerabilities"—known but unpatched flaws that persist in widely used code forks, such as in Microsoft's Sarathi-Serve, creating a hidden attack surface.
The core of the vulnerabilities lies in two primary insecure-by-default development practices common in the fast-moving AI space:
Insecure Deserialization with pickle: Many AI frameworks use Python's pickle module to serialize and deserialize data, including AI models and configurations. The pickle module is notoriously unsafe because it can execute arbitrary code when deserializing a maliciously crafted object. If an inference server accepts pickled data from an untrusted source, an attacker can send a malicious pickle payload to achieve RCE. This is a classic example of T1574 - Hijack Execution Flow.
Exposed ZeroMQ (ZMQ) Endpoints: ZMQ is a high-performance asynchronous messaging library used for communication between different components of the AI stack. The researchers found that many frameworks expose ZMQ endpoints to the network without any authentication (e.g., HMAC or TLS). An attacker who can connect to this endpoint can use functions like recv_pyobj() to send a malicious Python object (often a pickled object), again leading to RCE.
The vulnerabilities impact a wide range of popular AI/ML frameworks and servers, including:
vLLMSGLangModularSarathi-Serve is cited as an example of a project that inherited these vulnerabilities and remains unpatched, creating a "Shadow Vulnerability."While the source articles do not mention active in-the-wild exploitation, the public disclosure and the ease of exploitation make it highly likely that threat actors will begin targeting these systems. The vulnerabilities are straightforward to exploit for an attacker with network access to the vulnerable endpoints. Many of the core open-source projects have released patches, but the risk remains high for downstream applications and unmaintained forks.
A successful RCE attack on an AI inference server can have devastating consequences:
pickle.load() or pickle.loads() with data from untrusted sources. This is a form of File Analysis (D3-FA).vLLM.For Developers:
pickle: Do not use pickle to deserialize data from untrusted or unauthenticated sources. Use safer serialization formats like JSON for data interchange.For Organizations:
Software Update (D3-SU) action.Harden the configuration of AI frameworks to disable unsafe features and enforce authentication on communication channels.
Mapped D3FEND Techniques:
Update affected AI frameworks to patched versions that address the deserialization and ZMQ vulnerabilities.
Mapped D3FEND Techniques:
Run AI inference servers in isolated or sandboxed environments to limit the impact of a potential compromise.
Mapped D3FEND Techniques:

Cybersecurity professional with over 10 years of specialized experience in security operations, threat intelligence, incident response, and security automation. Expertise spans SOAR/XSOAR orchestration, threat intelligence platforms, SIEM/UEBA analytics, and building cyber fusion centers. Background includes technical enablement, solution architecture for enterprise and government clients, and implementing security automation workflows across IR, TIP, and SOC use cases.
Help others stay informed about cybersecurity threats