Since this installed a malicious dependency from NPM (and later with bunjs) in the pre install script, it would need at least complex correlation to catch. Maybe building and installing all AUR packages, which would cost far too much for the Arch team.
Individually and automatically scanning only the PKGBUILDs (the stuff actually on the AUR) would likely not have caught this.
That doesn’t mean it’s a bad idea to run a basic scan over every change, but it wouldn’t magically “fix” aur malware.
Maybe. But an unreliable scanner means a human has to check all the false positives and false negatives which can quickly take a lot of time for projects that are run by benevolent devs.
It’s really important to keep in mind this is done for free and that supply chain attacks like this one are very hard to identify.
I mean this is usually not the devs being careless, it’s very complex attacks on projects with very limited ressources. Attackers even sometimes choose purposefully projects that are “understaffed” (well, more understaffed than others).
They should have some sort of static code scanners on the repos at rest at this point that look for certain patterns and issue warnings.
Polymorphic malware is probably one of the easier things to do with LLMs, so static scanners seem of limited use.
Since this installed a malicious dependency from NPM (and later with bunjs) in the pre install script, it would need at least complex correlation to catch. Maybe building and installing all AUR packages, which would cost far too much for the Arch team.
Individually and automatically scanning only the PKGBUILDs (the stuff actually on the AUR) would likely not have caught this.
That doesn’t mean it’s a bad idea to run a basic scan over every change, but it wouldn’t magically “fix” aur malware.
It’s enough to build a pattern match and scan against it being elsewhere. Surely they did at least much to find all these packages with malware.
I wish it was that simple but I doubt there is any scanner that can differentiate between legitimate and malicious code.
Maybe an AI but even then it would probably be quite unreliable.
Unreliable is still a step up from completely absent.
Maybe. But an unreliable scanner means a human has to check all the false positives and false negatives which can quickly take a lot of time for projects that are run by benevolent devs.
It’s really important to keep in mind this is done for free and that supply chain attacks like this one are very hard to identify.
I mean this is usually not the devs being careless, it’s very complex attacks on projects with very limited ressources. Attackers even sometimes choose purposefully projects that are “understaffed” (well, more understaffed than others).