Heuristic Static Analysis Tool GuardDog Used to Detect Several Malicious PyPi Packages

GuardDog is new open source tool aimed at identifying malicious Python Packages using Sempreg and package metadata analysis. Thanks to a set of source code heuristics, GuardDog can detect malicious packages never seen before and has been used to identify several malicious PyPi packages in the wild.

DataDog, maker of GuardDog, reverse-engineered a number of known malicious PyPi packages to identify common attack vectors and techniques. These include mimicking a package name (typosquatting) or a maintainer's account or email domain to induce a victim to install that package; executing code at install time, especially in the post-install step, or downloading a second-stage executable; exfiltrating sensitive data, such as AWS access keys, and others.

GuardDog makes use of [static analysis] to identify malicious packages. [...] To detect malicious behavior, we use a set of heuristics designed to capture the patterns we observed. These heuristics within GuardDog scan for suspicious patterns from two locations: the source code and the package metadata on PyPi.

GuardDog source-code heuristics are implemented as Semgrep rules and include the ability to detect command overwriting in setup.py to produce the execution of a system command; the attempt to execute Base64-encoded data or images using eval or exec; the attempt to execute a file downloaded from the Internet; the inclusion of any environment variable in the payload to an outgoing network request, which can be used to exfiltrate sensitive data; and the use of suspicious domains, including .xyz, .top, or shortened urls.

Specifically, GuardDog leverages Semgrep's intra-procedural taint tracking, which analyzes the flow of data through a program to identify cases where such data is not transformed or sanitized before reaching a vulnerable function.

Besides source code, GuardDog scans package metadata against another set of heuristics, including typosquatting, changes in a package maintainer's email, and missing package information.

GuardDog's ability to detect malicious packages has been tested by running it on PyPi, leading to the identification of a number of packages that used any of the techniques described above to run malicious code or steal sensitive data.

GuardDog can be installed using pip or downloaded from GitHub.

About the Author

Sergio De Simone

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Sergio De Simone

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter