Packj - Large-Scale Security Analysis Platform To Detect Malicious/Risky Open-Source Packages


Packj (pronounced package) is a command line (CLI) tool to vet open-source software packages for "risky" attributes that make them vulnerable to supply chain attacks. This is the tool behind our large-scale security analysis platform Packj.dev that continuously vets packages and provides free reports.


How to use

Packj accepts two input args:

  • name of the registry or package manager, pypi, npm, or rubygems.
  • name of the package to be vetted

Packj supports vetting of PyPI, NPM, and RubyGems packages. It performs static code analysis and checks for several metadata attributes such as release timestamps, author email, downloads, dependencies. Packages with expired email domains, large release time gap, sensitive APIs, etc. are flagged as risky for security reasons.

Packj also analyzes public repo code as well as metadata (e.g., stars, forks). By comparing the repo description and package title, you can be sure if the package indeed has been created from the repo to mitigate any starjacking attacks.

Containerized

The best way to use Packj is to run it inside Docker (or Podman) container. You can pull our latest image from DockerHub to get started.

docker pull ossillate/packj:latest

$ docker run --mount type=bind,source=/tmp,target=/tmp ossillate/packj:latest npm browserify
[+] Fetching 'browserify' from npm...OK [ver 17.0.0]
[+] Checking version...ALERT [598 days old]
[+] Checking release history...OK [484 version(s)]
[+] Checking release time gap...OK [68 days since last release]
[+] Checking author...OK [[email protected]]
[+] Checking email/domain validity...ALERT [expired author email domain]
[+] Checking readme...OK [26838 bytes]
[+] Checking homepage...OK [https://github.com/browserify/browserify#readme]
[+] Checking downloads...OK [2.2M weekly]
[+] Checking repo_url URL...OK [https://github.com/browserify/browserify]
[+] Checking repo data...OK [stars: 14077, forks: 1236]
[+] Checking repo activity...OK [commits: 2290, contributors: 207, tags: 413]
[+] Checking for CVEs...OK [none found]
[+] Checking dependencies...ALERT [48 found]
[+] Downloading package 'browserify' (ver 17. 0.0) from npm...OK [163.83 KB]
[+] Analyzing code...ALERT [needs 3 perms: process,file,codegen]
[+] Checking files/funcs...OK [429 files (383 .js), 744 funcs, LoC: 9.7K]
=============================================
[+] 5 risk(s) found, package is undesirable!
=> Complete report: /tmp/npm-browserify-17.0.0.json
{
"undesirable": [
"old package: 598 days old",
"invalid or no author email: expired author email domain",
"generates new code at runtime",
"reads files and dirs",
"forks or exits OS processes",
]
}

Specific package versions to be vetted could be specified using ==. Please refer to the example below

$ docker run --mount type=bind,source=/tmp,target=/tmp ossillate/packj:latest pypi requests==2.18.4
[+] Fetching 'requests' from pypi...OK [ver 2.18.4]
[+] Checking version...ALERT [1750 days old]
[+] Checking release history...OK [142 version(s)]
[+] Checking release time gap...OK [14 days since last release]
[+] Checking author...OK [[email protected]]
[+] Checking email/domain validity...OK [[email protected]]
[+] Checking readme...OK [49006 bytes]
[+] Checking homepage...OK [http://python-requests.org]
[+] Checking downloads...OK [50M weekly]
[+] Checking repo_url URL...OK [https://github.com/psf/requests]
[+] Checking repo data...OK [stars: 47547, forks: 8758]
[+] Checking repo activity...OK [commits: 6112, contributors: 725, tags: 144]
[+] Checking for CVEs...ALERT [2 found]
[+] Checking dependencies...OK [9 direct]
[+] Downloading package 'requests' (ver 2.18.4) from pypi...OK [123.27 KB]
[+ ] Analyzing code...ALERT [needs 4 perms: codegen,process,file,network]
[+] Checking files/funcs...OK [47 files (33 .py), 578 funcs, LoC: 13.9K]
=============================================
[+] 6 risk(s) found, package is undesirable, vulnerable!
{
"undesirable": [
"old package: 1744 days old",
"invalid or no homepage: insecure webpage",
"generates new code at runtime",
"fetches data over the network",
"reads files and dirs",
],
"vulnerable": [
"contains CVE-2018-18074,CVE-2018-18074"
]
}
=> Complete report: /tmp/pypi-requests-2.18.4.json
=> View pre-vetted package report at https://packj.dev/package/PyPi/requests/2.18.4

Non-containerized

Alternatively, you can install Python/Ruby dependencies locally and test it.

NOTE

  • Packj has only been tested on Linux.
  • Requires Python3 and Ruby. API analysis will fail if used with Python2.
  • You will have to install Python and Ruby dependencies before using the tool:
    • pip install -r requirements.txt
    • gem install google-protobuf:3.21.2 rubocop:1.31.1
$ python3 main.py npm eslint
[+] Fetching 'eslint' from npm...OK [ver 8.16.0]
[+] Checking version...OK [10 days old]
[+] Checking release history...OK [305 version(s)]
[+] Checking release time gap...OK [15 days since last release]
[+] Checking author...OK [[email protected]]
[+] Checking email/domain validity...OK [[email protected]]
[+] Checking readme...OK [18234 bytes]
[+] Checking homepage...OK [https://eslint.org]
[+] Checking downloads...OK [23.8M weekly]
[+] Checking repo_url URL...OK [https://github.com/eslint/eslint]
[+] Checking repo data...OK [stars: 20669, forks: 3689]
[+] Checking repo activity...OK [commits: 8447, contributors: 1013, tags: 302]
[+] Checking for CVEs...OK [none found]
[+] Checking dependencies...ALERT [35 found]
[+] Downloading package 'eslint' (ver 8.16.0) from npm...OK [490.14 KB]
[+] Analyzing code...ALERT [needs 2 perms: codegen,file]
[+ ] Checking files/funcs...OK [395 files (390 .js), 1022 funcs, LoC: 76.3K]
=============================================
[+] 2 risk(s) found, package is undesirable!
{
"undesirable": [
"generates new code at runtime",
"reads files and dirs: ['package/lib/cli-engine/load-rules.js:37', 'package/lib/cli-engine/file-enumerator.js:142']"
]
}
=> Complete report: /tmp/npm-eslint-8.16.0.json

How it works

  • It first downloads the metadata from the registry using their APIs and analyze it for "risky" attributes.
  • To perform API analysis, the package is downloaded from the registry using their APIs into a temp dir. Then, packj performs static code analysis to detect API usage. API analysis is based on MalOSS, a research project from our group at Georgia Tech.
  • Vulnerabilities (CVEs) are checked by pulling info from OSV database at OSV
  • Python PyPI and NPM package downloads are fetched from pypistats and npmjs
  • All risks detected are aggregated and reported

Risky attributes

The design of Packj is guided by our study of 651 malware samples of documented open-source software supply chain attacks. Specifically, we have empirically identified a number of risky code and metadata attributes that make a package vulnerable to supply chain attacks.

For instance, we flag inactive or unmaintained packages that no longer receive security fixes. Inspired by Android app runtime permissions, Packj uses a permission-based security model to offer control and code transparency to developers. Packages that invoke sensitive operating system functionality such as file accesses and remote network communication are flagged as risky as this functionality could leak sensitive data.

Some of the attributes we vet for, include

Attribute Type Description Reason
Release date Metadata Version release date to flag old or abandonded packages Old or unmaintained packages do not receive security fixes
OS or lang APIs Code Use of sensitive APIs, such as exec and eval Malware uses APIs from the operating system or language runtime to perform sensitive operations (e.g., read SSH keys)
Contributors' email Metadata Email addresses of the contributors Incorrect or invalid of email addresses suggest lack of 2FA
Source repo Metadata Presence and validity of public source repo Absence of a public repo means no easy way to audit or review the source code publicly

Full list of the attributes we track can be viewed at threats.csv

These attributes have been identified as risky by several other researchers [1, 2, 3] as well.

How to customize

Packj has been developed with a goal to assist developers in identifying and reviewing potential supply chain risks in packages.

However, since the degree of perceived security risk from an untrusted package depends on the specific security requirements, Packj can be customized according to your threat model. For instance, a package with no 2FA may be perceived to pose greater security risks to some developers, compared to others who may be more willing to use such packages for the functionality offered. Given the volatile nature of the problem, providing customized and granular risk measurement is one of our goals.

Packj can be customized to minimize noise and reduce alert fatigue by simply commenting out unwanted attributes in threats.csv

Malware found

We found over 40 malicious packages on PyPI using this tool. A number of them been taken down. Refer to an example below:

$ python3 main.py pypi krisqian
[+] Fetching 'krisqian' from pypi...OK [ver 0.0.7]
[+] Checking version...OK [256 days old]
[+] Checking release history...OK [7 version(s)]
[+] Checking release time gap...OK [1 days since last release]
[+] Checking author...OK [[email protected]]
[+] Checking email/domain validity...OK [[email protected]]
[+] Checking readme...ALERT [no readme]
[+] Checking homepage...OK [https://www.bilibili.com/bangumi/media/md140632]
[+] Checking downloads...OK [13 weekly]
[+] Checking repo_url URL...OK [None]
[+] Checking for CVEs...OK [none found]
[+] Checking dependencies...OK [none found]
[+] Downloading package 'KrisQian' (ver 0.0.7) from pypi...OK [1.94 KB]
[+] Analyzing code...ALERT [needs 3 perms: process,network,file]
[+] Checking files/funcs...OK [9 files (2 .py), 6 funcs, LoC: 184]
=============================================
[+] 6 risk(s) found, package is undes irable!
{
"undesirable": [
"no readme",
"only 45 weekly downloads",
"no source repo found",
"generates new code at runtime",
"fetches data over the network: ['KrisQian-0.0.7/setup.py:40', 'KrisQian-0.0.7/setup.py:50']",
"reads files and dirs: ['KrisQian-0.0.7/setup.py:59', 'KrisQian-0.0.7/setup.py:70']"
]
}
=> Complete report: pypi-KrisQian-0.0.7.json
=> View pre-vetted package report at https://packj.dev/package/PyPi/KrisQian/0.0.7

Packj flagged KrisQian (v0.0.7) as suspicious due to absence of source repo and use of sensitive APIs (network, code generation) during package installation time (in setup.py). We decided to take a deeper look, and found the package malicious. Please find our detailed analysis at https://packj.dev/malware/krisqian.

More examples of malware we found are listed at https://packj.dev/malware Please reach out to us at [email protected] for full list.

Resources

To learn more about Packj tool or open-source software supply chain attacks, refer to our

The vetting tool <g-emoji alias=rocket class=g-emoji fallback-src=https://github.githubassets.com/images/icons/emoji/unicode/1f680.png>&#128640;</g-emoji> behind our large-scale security analysis platform to detect malicious/risky open-source packages (7)

Upcoming talks

Feature roadmap

  • Add support for other language ecosystems. Rust is a work in progress, and will be available in July '22 (last week).
  • Add functionality to detect several other "risky" code as well as metadata attributes.
  • Packj currently only performs static code analysis, we are working on adding support for dynamic analysis (WIP, ETA: end of summer)

Team

Packj has been developed by Cybersecurity researchers at Ossillate Inc. and external collaborators to help developers mitigate risks of supply chain attacks when sourcing untrusted third-party open-source software dependencies. We thank our developers and collaborators.

We welcome code contributions. Join our discord community for discussion and feature requests.

FAQ

  • What Package Managers (Registries) are supported?

Packj can currently vet NPM, PyPI, and RubyGems packages for "risky" attributes. We are adding support for Rust.

  • Does it work on obfuscated calls? For example, a base 64 encrypted string that gets decrypted and then passed to a shell?

This is a very common malicious behavior. Packj detects code obfuscation as well as spawning of shell commands (exec system call). For example, Packj can flag use of getattr() and eval() API as they indicate "runtime code generation"; a developer can go and take a deeper look then. See main.py for details.

  • Does this work at the system call level, where it would detect e.g. any attempt to open ~/.aws/credentials, or does it rely on heuristic analysis of the code itself, which will always be able to be "coded around" by the malware authors?

Packj currently uses static code analysis to derive permissions (e.g., file/network accesses). Therefore, it can detect open() calls if used by the malware directly (e.g., not obfuscated in a base64 encoded string). But, Packj can also point out such base64 decode calls. Fortunately, malware has to use these APIs (read, open, decode, eval, etc.) for their functionality -- there's no getting around. Having said that, a sophisticated malware can hide itself better, so dynamic analysis must be performed for completeness. We are incorporating strace-based dynamic analysis (containerized) to collect system calls. See roadmap for details.



Packj - Large-Scale Security Analysis Platform To Detect Malicious/Risky Open-Source Packages Packj - Large-Scale Security Analysis Platform To Detect Malicious/Risky Open-Source Packages Reviewed by Zion3R on 8:30 AM Rating: 5