Avoiding rogue Node.js packages by using good version dependencies in package.json

» TechSparx » Node.js Web Development » Node.js Installation » Avoiding rogue Node.js packages by using good version dependencies in package.json

David Herron

; Date: Sat Jan 22 2022

Tags: Node.js »»»» Node.js Installation »»»»

Several times now, widespread failures in Node.js applications were due to Node.js packages containing malware. If bad code makes it into our applications, what sort of trouble can ensue? That bad code has the same privileges your application has. What if the bad code isn't immediately obvious, but is stealthy malware that steals data?

As software professionals, our duty is ensuring our applications have as few problems as possible. We write unit tests, we hire a team of quality engineers to focus on product quality, we have bug reporting systems so our customers can inform us about problems, and ideally we responsibly deal with all such reports by fixing related bugs.

But then, one day we'll type "npm install && npm update" and suddenly our application isn't working. That's if we're lucky, that the problem will be obvious. Sometimes the problem is hidden, with some 3rd party packages having been implicated in stealing data, or running cryptocurrency mining code.

The current example is that in early January 2022, a disgruntled package maintainer updated two packages - colors and faker - to contain malware, causing applications to visibly misbehave, and contain messages about Aaron Schwartz. The developer had previously written about "Fortune 500's" taking free work from open source programmers, and not supporting them. The packages in question are widely used, and the misbehaving updated code caused widespread problems.

This isn't the first time widespread problems were vectored through packages in the npm registry. And, the problem isn't limited to Node.js packages. For example, our brethren Java developers are facing a widespread problem due to the Log4J library containing security vulnerabilities. At least twice over the last year The White House has warned over security vulnerabilities in open source software ( June 2021), and has begun working with the software industry to improve open source software security (January 2022).

A few examples in the Node.js ecosystem are:

January 2022 The developer of colors and faker introduces obvious malware into these packages, as noted above
December 2021 - A malicious npm package used by Discord is stealing credentials from Discord users.
October 2021 Three malicious npm packages was found to be running cryptocurrency mining code.
September 2021 A critical security bug reported in a widely used package.
March 2016 A package developer was caught in a name dispute over one of his packages, and in a fit of anger removed all his packages from the npm repository. One of those, left-pad, was widely used causing a wide range of broken applications.

This makes it clear that the npm package repository is open to problems from packages that either purposefully or inadvertently have serious problems or malware. The question is - how do we avoid these problems?

One common problem in each case has to be the package dependencies listed in package.json. The dependencies field gives us, as a convenience, the option of allowing a range of package versions to be used. The assumption seems to be that most package developers are honest, and if version N is okay then version N+1 will also be okay.

The experience with colors and faker, and for that matter left-pad, says that's not a safe assumption to make.

It's important to carefully test your application, in a way which also exercises the 3rd party packages. But, it's important to not only test it, but to be careful with the version number in package dependencies.

The principle is - after testing your application against version N, that's the version you tested with, and you do not allow the application to be used against a different version until you test using that other version.

The dependencies field in package.json is how we declare package dependencies. This is an excellent way to automatically assemble code into a working project, and the dependencies field gives us a lot of flexibility. We can load packages directly from an HTTP URL, for example, completely bypassing a central repository like npm.

But consider these npm package dependency version specifiers: >version, >=version, ~version, ^version, 1.2.x, * and the empty string

This style of dependency declaration allows installation of package versions which you did not test. Presumably you tested your application against the version release of the dependencies. Does that mean version + 1 will also work, and you can allow your application to use it even though you haven't tested against that version?

These specifiers are a convenience, and in most packages breaking changes or malware being introduced do not happen. We can specify 1.2.x and know that our application will automatically update to the latest within the 1.2 series of releases. In most cases that has worked great, and we get to go on with life. That is, until one day a disgruntled developer posts version 1.2.42 containing malware.

Any inexact package dependency is open to problems of this sort.

The cure is to use explicitly exact version numbers in dependencies. This means using version rather than something like >version.

I think we're all guilty of this, I sure am. Here's the dependencies of one of the applications I've created:

"dependencies": {
    "archiver": "^5.3.x",
    "cheerio": "1.0.0-rc.10",
    "chokidar": "^3.5.2",
    "commander": "^8.3.x",
    "globfs": "^0.3.3",
    "js-yaml": "^4.1.x",
    "mime": "^3.x",
    "sprintf-js": "^1.1.2",
    "text-statistics": "0.1.1",
    "unzipper": "^0.10.x",
    "uuid": "^8.3.x",
    "xmldom": "^0.6.x"
}

Clearly I've just given myself a task to clean this up. That's 10 dependencies using loose inexact version numbers. Any of those packages could become a problem. For each, an update could be shipped containing a breaking change that I won't see because I haven't tested against that version.

What are we trying to avoid by declaring dependencies in this way? It's about lowering our administrative overhead. This is 10 packages where I must somehow track their status, then update from time to time. Each time, one must check for updates to each package. More importantly, how might we be automatically warned about security vulnerabilities?

Some available tools include

The npm audit command consults a database of known security vulnerabilities - for example xmldom has a known vulnerability, but no known fix
npm-check-updates lists dependencies for which updates are available
npm-check also lists dependencies for which updates are available. This scans not only package.json but the source code for references in require and import statements.
The npm list --depth 0 command lists packages your application directly depends on
The npm outdated command lists packages for which updates are available
The npm cache clean --force cleans out the npm cache, which can be useful while doing updates

We only have control over the direct dependencies listed in our package.json. The issues raised here can arise from nested dependencies, where one of the packages we depend on depends on a broken package. That means getting maintainers of intermediate packages interested in updating their dependencies.

About the Author(s)

David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.

Setting up the Typescript compiler to integrate with Node.js development Using Git submodules to streamline Node.js package development