Malicious Hugging Face Repo Impersonating OpenAI Distributes Infostealer Malware

Executive Summary

A sophisticated supply chain attack has targeted the AI/ML developer community through the popular Hugging Face platform. Threat actors created a malicious repository that impersonated a legitimate project from OpenAI, tricking users into downloading infostealing malware. The fake repository, which typosquatted OpenAI's "Privacy Filter" project, leveraged social engineering and likely bots to reach the #1 trending spot on Hugging Face, lending it an air of legitimacy. Before being taken down, the malware was downloaded over 244,000 times. The payload was an infostealer designed to harvest a wide range of sensitive data from victims' machines. This incident highlights the vulnerability of open-source ecosystems and the increasing trend of attackers targeting the AI development pipeline.

Threat Overview

The attack centered on a malicious Hugging Face repository named Open-OSS/privacy-filter. The attackers employed several deceptive tactics:

Impersonation: The repository's name was a typosquat of a real OpenAI project, and its description was copied almost verbatim to deceive users.
Popularity Inflation: The project was artificially boosted to the #1 trending position on Hugging Face, likely using bots to generate downloads and "likes." This social proofing made the repository appear trustworthy and popular.
Promotion: The attackers promoted the malicious repository on platforms like LinkedIn and Reddit to drive traffic and downloads.

The malware payload was an infostealer targeting Windows systems. It was equipped with anti-analysis capabilities, including checks for virtual machines, sandboxes, and debuggers, to evade detection. Upon execution, the malware would steal a wide array of sensitive data, including:

User credentials
Browser session tokens
Cryptocurrency wallets

The stolen data was compressed and exfiltrated to a command-and-control (C2) server located at recargapopular[.]com.

Technical Analysis

This campaign is a classic example of a supply chain attack targeting developers by poisoning an open-source repository.

MITRE ATT&CK Techniques:

T1195.001 - Compromise Software Supply Chain: Compromise Software Distribution: The attackers published malicious code to a trusted public repository (Hugging Face) to distribute malware.
T1555 - Credentials from Password Stores: The infostealer was designed to steal credentials from browsers and other applications.
T1539 - Steal Web Session Cookie: The malware targeted browser session tokens to hijack active user sessions.
T1631 - Steal Application Access Token: Specifically targeting crypto wallets involves stealing access tokens or private keys.
T1497 - Virtualization/Sandbox Evasion: The malware included checks for VMs and debuggers to avoid analysis.
T1071.001 - Application Layer Protocol: Web Protocols: The stolen data was exfiltrated over HTTP/HTTPS to the C2 server.

Impact Assessment

Developers and organizations that downloaded and used code from this malicious repository are at high risk.

Credential Compromise: Stolen credentials can be used to access corporate networks, cloud environments, and other sensitive systems.
Financial Loss: The theft of cryptocurrency wallet keys can lead to the immediate and irreversible loss of funds.
Session Hijacking: Stolen session tokens allow attackers to bypass MFA and take over active sessions for web services like email, source code repositories, and cloud consoles.
Further Intrusion: The compromised developer machines can be used as a beachhead for further attacks against their employer's network.

IOCs — Directly from Articles

Type

Domain

Value

recargapopular[.]com

Description

Command-and-control (C2) server

Type

Other

Value

Open-OSS/privacy-filter

Description

Malicious Hugging Face repository name

Cyber Observables — Hunting Hints

Network Traffic: Monitor for any DNS requests or outbound connections to recargapopular[.]com.
File System: Search developer workstations for any files or projects downloaded from the Open-OSS/privacy-filter repository on Hugging Face.
Process Monitoring: Look for suspicious processes that perform checks for VMWare, VirtualBox, or debugger processes before executing their main payload.
Log Analysis: Review web proxy logs to see if any internal systems accessed the malicious Hugging Face repository URL.

Detection & Response

IOC Scanning: Block the C2 domain recargapopular[.]com at the network perimeter. Scan network logs for any historical connections.
Endpoint Detection (D3FEND: D3-PA - Process Analysis): EDR tools can be configured to detect common infostealer behaviors, such as a process accessing credential stores of multiple web browsers, querying for cryptocurrency wallet files, and then making an external network connection.
Incident Response: Any machine where code from the malicious repository was executed should be considered fully compromised. The machine should be isolated and reimaged. All credentials (personal and corporate) used or stored on that machine must be rotated immediately.

Mitigation

Scrutinize Open-Source Packages: Do not blindly trust packages, even on reputable platforms. Scrutinize repository names for typosquatting. Check the history and reputation of the publisher. Be wary of new packages that rapidly gain popularity.
Use Sandboxed Environments: Test new or untrusted code in an isolated, sandboxed environment before running it on a production or development machine.
Restrict Permissions: Run development tools and processes with the lowest possible privileges. Avoid running code downloaded from the internet as an administrator.
Developer Training (D3FEND: D3-UT - User Training): Train developers on the risks of supply chain attacks and how to spot suspicious open-source projects. This includes verifying publishers, checking for typosquatting, and being skeptical of projects with suspiciously high, inorganic-looking popularity metrics.

To combat supply chain attacks like the one on Hugging Face, organizations must treat all new open-source components as potentially malicious. A key defense is to subject these components to Dynamic Analysis in a secure, isolated sandbox before they are introduced into the development lifecycle. For the fake 'privacy-filter' package, a sandbox would execute the code and monitor its behavior. It would observe the malware's anti-VM checks, its attempts to access browser credential stores (e.g., Local State and Login Data files in Chrome), its search for crypto wallet files, and critically, its attempt to exfiltrate data to the C2 server at recargapopular[.]com. This provides a definitive verdict on the malicious nature of the package without ever exposing a real developer machine to risk. This process should be automated as part of a secure software development lifecycle (SSDLC).

Fake OpenAI Repository on Hugging Face Tricks Developers into Downloading Infostealer