Distributing, publicly or privately, Node.js modules without using npm repository

; Date: 2014-09-07 11:51

Tags: Node.JS

The default assumption for distributing a Node.js module is to publish it in the public npm registry. It's a simple declaration in the package.json, and then you tell your customers to simply type "npm install". The public npm registry takes care of the details, and you can even use versioning to make sure your customers use tested module versions. But what if you don't want to publish modules in the public npm registry? Maybe your modules are proprietary, or maybe the modules are too intertwined with a parent product to be of general use?

The npm registry is getting filled with modules, some of which are helper modules for specific parent products. Ideally, modules listed in the public npm registry are of general use and can be used with any project. But, browsing the public npm registry we see many modules that are specific to a certain parent product.

An example, picked at random, is assemble-markdown-data whose description reads "An Assemble plugin for automatic parsing of markdown." Assemble is some kind of static site generator for use in grunt.js, so we have a plugin for Assemble which is taking up an entry in the npm registry. While the npm registry can probably handle zillions of entries, it means anyone looking for markdown modules will have that many more modules to evaluate. Oh, and to be fair, for my own product, (akashacms.com) AkashaCMS, I published all the plugin modules into the npm registry. I'm looking at myself as much as the other people, and wondering if this is the best idea.

For privately published modules, the recommendation is to run a private npm registry. This lets you locally cache stuff from the public npm registry, but also is a place to publish company-private modules.

It's not necessary to run a private npm registry, and it's not necessary to publish all your modules into the public npm registry.

The default preferred package.json usage, for declaring dependencies, results in every module being published into the npm registry. The default best practice for the the dependencies field in the package.json is:

{ "dependencies" :
  { "foo" : "1.0.0 - 2.9999.9999"
  , "bar" : ">=1.0.2 <2.1.2"
  , "baz" : ">1.0.2 <=2.3.4"

The version number specifiers reference the package.json version in the npm registry. It's so easy to just go ahead and write the dependencies this way, and then publish all the dependencies into the npm registry. As I said, that leads to bloat of the npm registry and the rest of us have to evaluate more and more modules before finding the right candidate.

But there's another way, because the dependencies format allows you to specify a URL or git repository reference or even a github reference. It's all specified in the (www.npmjs.org) npm package.json documentation, so let's do a quick review.

First off is a version number dependency, as shown above. The version numbering scheme is documented in the (www.npmjs.org) semver page. These always refer to modules published in the npm registry.

Next is to simply specify a URL.

{ "dependencies" :
  { ...  "asd" : "http://asdf.com/asdf.tar.gz" ... }

The documentation says "You may specify a tarball URL in place of a version range. This tarball will be downloaded and installed locally to your package at install time." That's simple and straightforward. Since it says "tarball" that means it supports .tar.gz bundles, and perhaps doesn't support other packaging formats like .zip. That's fine with me if it only supports .tar.gz but those other packaging formats do exist because other people prefer them, and it would be nice if the documentation explicitly said so.

Next, there are two ways to specify git URL's. One is specifying a full URL, that includes a #tag reference to a commit tag, branch tag, etc. The other is specifying the simple user-name/repository-name reference for a github repository. Referencing a commit tag in a repository is done using: user-name/repository-name#tag-name

What this means is Node.js developers have three ways of listing a dependency to a module which is storable in a public location, without forcing you to publish the module to the public npm registry. Further these methods let you keep a module private inside your company, without having to run a private npm registry, and without publishing the module in a public location.

You don't have to give up the module version numbering support either. Both the "tarball URL" and "git repository URL" methods let you specify a version number to reference.

UPDATE: Since writing the above, the npm repository had the "leftpad issue" strike. The leftpad package is tiny, but was widely widely used, including some critical tools like Babel. One day the author of leftpad got disgruntled about npm Inc's behavior towards him, and decided to unpublish all his packages from the npm repository. That meant leftpad disappeared and broke a bunch of other tools, including Babel.

To fix the world, npm Inc quickly republished the leftpad package and all was fine. Since then, npm Inc changed the policies so that we can no longer unpublish packages. Once a package has been published it will remain in the repository for forever. Raising the spectre of 20 years from now npm Inc having to keep around packages that hadn't been updated for 19 years just because someone published them one day.

While I understand the rationale of the decision, I want to be able to unpublish packages that should no longer be distributed. As the owner of a package, I believe it is my right to unpublish it from the repository. But clearly npm Inc has chosen a different policy.

The point being that some of us may want to distribute code using mechanisms other than the npm repository. Fortunately, package.json gives us the freedom to do so.