Slopsquatting targets LLM coders with supply-chain attacks
Sometimes LLMs generate fake package names. Attackers know this, and publish fake packages under these hallucinated names.
Researchers have warned us about LLM package hallucinations for a while. But now, researchers have discovered that the LLM hallucinations are stable and frequent. 5-20% of package names (depending on the LLM) are fake. Attackers know this, and they are squatting on these fake names. They are doing this today.
To back up: In June of 2023 and June of 2024, security researchers explored LLMs hallucinating package names in popular languages. This was a variant of “typosquatting”, where an attacker publishes a package with an easily-confused name and waits for people to accidentally download it.
The idea behind the slopsquatting attack is simple:
LLMs generate code that reference nonexistant packages.
An attacker publishes packages under these names. Maybe there is a malicious payload now. Maybe a later update will have the malicious payload.
A developer generates code containing the fake package name.
The package exists and seems to work. The developer accepts the changes into their project.
To prove their point, one of the researchers published a fake empty package under the name huggingface-cli
.
The result, he claims, is that
huggingface-cli
received more than 15,000 authentic downloads in the three months it has been available."In addition, we conducted a search on GitHub to determine whether this package was utilized within other companies' repositories," Lanyado said in the write-up for his experiment.
"Our findings revealed that several large companies either use or recommend this package in their repositories. For instance, instructions for installing this package can be found in the README of a repository dedicated to research conducted by Alibaba."
Alibaba did not respond to a request for comment.
Lanyado also said that there was a Hugging Face-owned project that incorporated the fake huggingface-cli, but that was removed after he alerted the biz.
And I have some sympathy for the Hugging Face employee that added a dependency to the fake package1. It’s lazy to point a finger at them. However, the modern package manager situation is unbelievable. I’m supposed to just download these third-party libraries with dozens of dependencies across dozens of authors and hope that it’s going to all work out in the end? You’ll wake up screaming every night if you think too hard about it.
A fake package with a plausible name is fine. But how often do LLMs actually cause this to happen? Thanks to researchers, now we have some data.
First, LLMs hallucinate fake packages a lot.
In a recent study, researchers found that about 5.2 percent of package suggestions from commercial models didn't exist, compared to 21.7 percent from open source or openly available models.
The names they hallucinate tend to be persistent.
The recurrence appears to follow a bimodal pattern - some hallucinated names show up repeatedly when prompts are re-run, while others vanish entirely - suggesting certain prompts reliably produce the same phantom packages.
As noted by security firm Socket recently, the academic researchers who explored the subject last year found that re-running the same hallucination-triggering prompt ten times resulted in 43 percent of hallucinated packages being repeated every time and 39 percent never reappearing.
The researchers also noted that open-source models tended to hallucinate package names about 4 times as often as their commercial counterparts. So, as of September 2024 — which is when this research was published — you get what you pay for.
“But that was September, and it’s April now! The LLM world moves so fast! Surely all of the models have had time to adapt.” If you believe that then don’t check your dependencies. That’s way higher than my own personal risk tolerance. Personally, I’m going to keep reviewing the code that they generate until I’m made obsolete as a reviewer.
But take heart! People are doing this in the wild and training each other.
He also noted that recently a threat actor using the name "_Iain" published a playbook on a dark web forum detailing how to build a blockchain-based botnet using malicious npm packages.
Aboukhadijeh explained that _Iain "automated the creation of thousands of typo-squatted packages (many targeting crypto libraries) and even used ChatGPT to generate realistic-sounding variants of real package names at scale. He shared video tutorials walking others through the process, from publishing the packages to executing payloads on infected machines via a GUI. It’s a clear example of how attackers are weaponizing AI to accelerate software supply chain attacks."
I tried to find these videos, but this was futile since I don’t know anything about the dark web. The extent of my l33t haxxing was running a ton of search queries against Bing and Yandex instead of Google.
I think the only silver lining is that this guy is selling video courses. If he were wildly successful with this approach he wouldn’t need to grind out the videos to make extra money. But you know what they say about attacks: they never get worse. They only get better and better.
As long as they didn’t have YOLO mode enabled. Not even once.