Using Git submodules to streamline Node.js package development

; Date: Mon Dec 13 2021

When developing a Node.js package, we typically create a test application with which to exercise the package. That means for every source code change in the package, we must reinstall the package into the node_modules directory of the test application, which can be very frustrating. Git's submodule feature can streamline this by letting you directly edit both the test application and package with no reinstalls required.

Image by Author, logos by corresponding projects

Before starting, let's review the situation. Say we're developing a package named P. It's a nice little library to do something useful. We'll be using an artificial example package later. To develop the package, it's usually helpful to write a program (or two or three) to exercise the package. I'm not talking about the test suite, but an honest-to-goodness program.

In my case, the application under development is AkashaCMS, a static website generator. There are several packages in the AkashaCMS ecosystem. Most of them have unit tests in the package, but I've also created several example websites which are used for a functional testing.

One of those is the akashacms-example website, which is both a working example of an AkashaCMS project, and a functional test with which I develop AkashaCMS features. The akashacms-example project has been reworked to use Submodules to streamline package development.

In other words, akashacms-example is an example of a program with which our package can be exercised.

Speaking generally, the application, A, must install P in its node_modules directory. We can then edit the code for A to try out functions in P. But, what happens if/when we must change the code in P?

Under normal circumstances we'd have two editor windows open, one pointing at the files in P, the other pointing at A. As soon as we edit a file in P we must copy the changed files into A/node_modules/P. This gets to be tedious after awhile, and of course sometimes we'll forget to update the files in A and then wonder why the change we made in the code didn't change any behavior.

There is an npm command, npm link, that's meant to streamline this issue. As the name implies, it links the package, P, into the application's node_modules directory. We can then edit both P and A freely without need for reinstalling files from P to A. But, when I read the npm link documentation, it's confusing enough that I never do it.

Git Submodules and Node.js development

Instead, I've found Git submodules to be more straight-forward. For a full tutorial on getting started with submodules: How and why to use Git Submodules

Briefly, a Git submodule is:

A parent Git repository contains submodule configuration referencing one or more other Git repositories
The external repositories appear within the directory tree of the checked-out parent repository
The parent repository contains a reference to the external repository, as well as the SHA-1 hash for the commit which is to be checked out

This feature seems to best fit supporting package development. Namely,

In the repository for A, configure it to embed the P repository as a submodule
A should be a test application
In the package.json in A, we use a local file reference as the dependency for P -- causing npm to make a symbolic link inside node_modules to the directory for P

That's the plan, so let's do it.

Creating two repositories

In GitHub/GitLab/Gitea/etc create two repositories, one for the application, the other for the package. We have created two for this purpose:

To start, we clone the sample-app repository:

$ git clone git@github.com:robogeek/sample-app-2021-12-12.git
...
$ git submodule init
$ git submodule add  https://github.com/robogeek/sample-package-2021-12-12.git package
Cloning into '.../sample-app-2021-12-12/package'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (4/4), done.
$ git submodule update --recursive --remote --checkout

We're showing how the repositories were initialized. By the time you see this, the repositories will be fully setup and functional. In any case, we start with a blank repository and the first thing we do is use git submodule add to bring in the sample-package repository.

Let's start by setting up a simple package in the sample-package repository:

$ cd package
$ npm init -y
...
$ git add package.json 
$ git commit -m 'Initial revision' package.json 
[main 4a14cf6] Initial revision
 1 file changed, 12 insertions(+)
 create mode 100644 package.json

This creates a blank Node.js project containing a default package.json. By default it supports CommonJS modules, and specifies index.js as the main module. Therefore, let's create a file named index.js containing this:

module.exports.hello = (txt) => {
    console.log(`Hello ${txt}`);
};

A simple package with which to get started. We'll modify this later.

Next, let's setup the sample-app directory:

$ cd ..
$ npm init -y

This sets up a blank default package.json. To it, add this:

  "dependencies": {
      "package": "./package"
  }

This dependency specifier references a local file, namely the package we just created in the package directory.

Next, create a file named app.js containing:

const P = require('package');

P.hello('There');

We've imported the package, naming it P, and called the method it exports. This is a good start for demonstrating submodules use.

Then run these commands

$ npm install

added 1 package, and audited 3 packages in 1s

found 0 vulnerabilities

$ ls -l node_modules/
total 0
lrwxr-xr-x  1 david  admin  10 Dec 12 22:48 package -> ../package

$ node app.js 
Hello There

These commands are executed in the sample-app directory. It installs any dependencies required to run the application. The only dependency we've listed in package.json is the package directory. Therefore the only item resulting in node_modules is that package.

Notice that package is a symbolic link to the package directory. This is the key to what we're working to create. The symbolic link lets us edit files in the package directory, and instantly have the edit reflected in the code installed for the application in node_modules.

The last bit demonstrates we have a working application.

Committing the initial files to the repositories

We have initial code for a package we're developing, and a sample application with which we'll test that package. This means we've reached the first milestone, of having the minimal viable product. It's necessary to commit this milestone to the repositories.

$ cd package
$ git status -s
?? index.js
$ git add .
$ git commit -a -m 'Initial Revision'
[main 5623118] Initial Revision
 1 file changed, 4 insertions(+)
 create mode 100644 index.js
$ git push
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 4 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 810 bytes | 810.00 KiB/s, done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), done.
To https://github.com/robogeek/sample-package-2021-12-12.git
   c8adfa5..5623118  main -> main

We start with the package directory, adding any files which are present, and pushing them to the repository.

To use the HTTPS URL, we have configured an access token with GitHub, and have provided that token as the password.

Now, lets do the same in the sample-app directory:

$ cd ..
$ git add .

$ git status -s
A  .gitmodules
A  app.js
A  package
A  package.json

$ git commit -m 'Add submodules' .gitmodules package 
[main 53100a7] Add submodules
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 package

$ git commit -a -m 'Initial Revision'
[main f243c11] Initial Revision
 2 files changed, 27 insertions(+)
 create mode 100644 app.js
 create mode 100644 package.json
$ git push
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 4 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 1.06 KiB | 1.06 MiB/s, done.
Total 7 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), done.
To github.com:robogeek/sample-app-2021-12-12.git
   39bca02..f243c11  main -> main

We again have some files to add. The first two are the house-keeping files Git uses to record data about the submodules configuration. We committed them separately so they have a correct commit message. We then committed the rest, and pushed everything to the repository.

Examining the application and package hierarchy together

What we've created is this:

$ tree .
.
├── README.md
├── app.js
├── node_modules
│   └── package -> ../package
├── package
│   ├── README.md
│   ├── index.js
│   └── package.json
└── package.json

3 directories, 6 files

A directory, which is a Git repository, containing a Node.js application. Contained within that is another directory, containing a Node.js package. It's not shown in this diagram, but the package directory is also a Git repository, and is managed as a submodule.

One way to understand is with these commands:

$ git remote -v
origin  git@github.com:robogeek/sample-app-2021-12-12.git (fetch)
origin  git@github.com:robogeek/sample-app-2021-12-12.git (push)
$ cd package
$ git remote -v
origin  https://github.com/robogeek/sample-package-2021-12-12.git (fetch)
origin  https://github.com/robogeek/sample-package-2021-12-12.git (push)

The Git URL for each is different, reflecting that each is its own Git repository. Therefore when we pushed commits, each commit went into the corresponding repository. You can visit them on the repository website to see.

The purpose of creating this hierarchy is to simplify editing the files for both application and package together as one unit.

Modifying the package and application simultaneously

Now that we've got a repository with a submodule repository, and we've examined it closely, let's try modifying the package in the submodule. This was, after all, the purpose for which we brought you here to this article.

We start with a feature request from management: "MAKE A CHANGE". Sigh, those managers sometimes aren't specific with their feature requests. You go back to the manager for clarification, and she says it needs a square root function.

Doesn't the manager know that Math.sqrt does it, and we don't need such a function in the package?

In any case, what we're doing is implementing a change in a package. Of course in a real scenario it would be a useful change. But this sample application is clearly artificial, so let's just go with it.

In the package directory is our package, of course. Open up package/index.js and add the following:

module.exports.square_root = function square_root(n) {
    let _n;
    if (Number.isInteger(n)) {
        _n = Number.parseInt(n);
        if (Number.isNaN(_n)) {
            throw new Error(`${n} is an Integer that did not parse`);
        }
    } else if (Number.isNaN(Number.parseFloat(n))) {
        throw new Error(`${n} did not parseFloat`);
    } else {
        _n = Number.parseFloat(n);
    }
    return Math.sqrt(_n);
};

This fulfills the requested feature, and for good measure adds some error checking. As was said in Elements of Programming Style, it is important to check the arguments to functions.

Since our test application is about exercising the package, lets add this in app.js:

console.log(`20: ${P.square_root(20)}`);
console.log(`0: ${P.square_root(0)}`);
console.log(`-20: ${P.square_root(-20)}`);
console.log(`twenty: ${P.square_root('twenty')}`);

That tries several important variations on the behavior of square root functions. Running the application we get this:

$ node app.js 
Hello There
20: 4.47213595499958
0: 0
-20: NaN
/Volumes/Extra/akasharender/t/sample-app-2021-12-12/package/index.js:14
        throw new Error(`${n} did not parseFloat`);
        ^

This is the expected output. To understand why that's the expected output, it's suggested to revisit high school mathematics.

Now that we've verified the behavior using ad-hoc testing, we should go into the package directory, add a unit test directory, and add a unit test to verify this behavior. Yes, that's what we should do. But in the interest of space we'll leave out.

What's more important is to notice how easy this was. Even though the package is installed in the node_modules directory in sample-app, we could edit its source files and they were immediately recognized.

The other win is the ease of which we can commit the code to the repository:

To prepare, run these commands:

$ git status -s
 M app.js
 m package
$ cd package
$ git status -s
 M index.js

Changes have been made in both the parent and child repositories. Let's start with the change in package:

$ git commit -a -m 'Add sqrt function'
[main fd1629b] Add sqrt function
 1 file changed, 15 insertions(+)

$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 324 bytes | 324.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To github.com:akashacms/akashacms-example.git

This is normal Git practice, commit the change then push to the repository.

Notice that the repository is referenced using an HTTPS URL. For this to work, we created an access token in the GitHub account.

$ cd ..
$ git status -s
 M app.js
 M package

Returning to the sample-app directory we see that the lower case m in the status has now changed to an upper case M. Whatever that means surely has to do with having committed the change in package.

What changed in package, by the way, is the SHA-1 of the latest commit in the repository. Because we've made commits in the submodule, the latest SHA-1 has to be recorded in sample-app.

It's useful to run this:

$ git diff
diff --git a/app.js b/app.js
index ec46eb4..2086daf 100644
--- a/app.js
+++ b/app.js
@@ -2,3 +2,8 @@
 const P = require('package');
 
 P.hello('There');
+
+console.log(`20: ${P.square_root(20)}`);
+console.log(`0: ${P.square_root(0)}`);
+console.log(`-20: ${P.square_root(-20)}`);
+console.log(`twenty: ${P.square_root('twenty')}`);
diff --git a/package b/package
index 5623118..fd1629b 160000
--- a/package
+++ b/package
@@ -1 +1 @@
-Subproject commit 5623118d4b3621163740c84c33ca345c0c320054
+Subproject commit fd1629bad48ca6d6b5ffa5f80c8b2a5ea160dbae

Using diff we see that the SHA-1 commit hash has indeed changed.

$ git commit -a -m 'Exercise the square_root function'
[main 3a8f0d8] Exercise the square_root function
 2 files changed, 6 insertions(+), 1 deletion(-)

$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 421 bytes | 421.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To github.com:robogeek/sample-app-2021-12-12.git
   f243c11..3a8f0d8  main -> main

And, we can push the commit to the repository.

Didn't that also happen in a natural normal way? Wasn't this easy?

Our next task is going to the repository and examining what changed.

Checking out the `sample-app` repository

Because the sample-app repository has submodules, there is a little more involved in cloning its repository. Usually we just run git clone and we're good to go. With Submodules we must ensure the submodules are also checked out.

$ git clone --recursive git@github.com:robogeek/sample-app-2021-12-12.git
Cloning into 'sample-app-2021-12-12'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
Receiving objects: 100% (14/14), done.
Resolving deltas: 100% (4/4), done.
remote: Total 14 (delta 4), reused 9 (delta 2), pack-reused 0
Submodule 'package' (https://github.com/robogeek/sample-package-2021-12-12.git) registered for path 'package'
Cloning into '/Volumes/Extra/akasharender/t/t/sample-app-2021-12-12/package'...
remote: Enumerating objects: 13, done.        
remote: Counting objects: 100% (13/13), done.        
remote: Compressing objects: 100% (11/11), done.        
remote: Total 13 (delta 4), reused 8 (delta 2), pack-reused 0        
Receiving objects: 100% (13/13), done.
Resolving deltas: 100% (4/4), done.
Submodule path 'package': checked out 'fd1629bad48ca6d6b5ffa5f80c8b2a5ea160dbae'

The simplest is to add the --recursive option as shown here. Read through this output, and you'll see it recognize the submodule package and proceed to clone that as well. And, notice that it checked out the commit hash mentioned earlier.

If we go into the submodule, we learn something interesting:

$ cd package
$ git branch
* (HEAD detached at fd1629b)
  main
$ git status
HEAD detached at fd1629b
nothing to commit, working tree clean

The file package/index.js has the square_root function, so the code is properly checked out. But, the repository is in a detached HEAD state. When in this state you'll have a hard time making changes and pushing them to the repository, because instead you'll get errors.

The solution for this is easy:

$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.

$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

This makes it so the submodule is no longer in the detached head state.

What about regular applications?

Why did we suggest the best use of submodules with Node.js is to develop a sample application for a Node.js application? What about using submodules with a regular production Node.js application?

The distinction centers on who will use that application, and their ease of setup.

Consider command-line developer tools like postcss and serverless. Both are implemented in Node.js, and the installation instruction is to run npm install -g PACKAGE-NAME after which you can run the tool. It is important for the application provider that the installation experience is as simple as possible, to provide as few adoption hurdles to their customers.

The installation procedure cannot be to first clone a Git repository, run a few extra commands, etc. It has to be simple, such as npm install, after which the customer can use the application.

The technique described above involves dependencies of this sort:

"dependencies": {
    "package-name": "./modules/package-name"
}

The directory, ./modules/package-name, is filled with code by a Git submodule. Therefore, setting up such an application requires cloning the Git repository and running Git commands before you can run npm install. Hence it is not feasible or possible to distribute such an application through npm such that the installation experience is simply npm install.

NOTE: When I say npm install, I also mean the equivalent yarn command.

There are scenarios for production application delivery that do not involve an end customer running npm install. In such cases the end customer is probably not installing the application, but instead using a web service, or in the case of an Electron application, are running a prebuilt GUI.

This does not mean the technique can only be used with sample or test applications.

Consider a different scenario, a Node.js web application being delivered as a Docker container. In that case the Dockerfile will contain some commands like this:

RUN git clone --recursive CLONE-URL /app
WORKDIR /app
RUN npm install

This is roughly the same as the previous examples. We're cloning the repository into a directory in the container image, then changing to that directory, and running npm install inside the container that's being built.

The end user of this container image does not know that you used submodules. What they get is a complete image with all the required files to run the application.

Summary

We've gotten a brief introduction to using Git submodules with Node.js package development.

This lets us easily edit the source of a package, and immediately rerun a sample application with no need to synchronize any changes. That streamlines our development workflow. We can make changes in one or more submodule directories freely, then verify our changes either by running unit test suites in the submodules, or by running an ad-hoc test with the application in the parent repository.

During the writing of this article, I used the akashacms-example sample website, to work on a feature change in the akashacms-dlassets package. That sample website (see the link above) is an example of a sample application with which one can exercise packages. Using it for feature development in the past was frustrating because of the need to constantly copy changes from the source directory into node_modules. Using submodules made it a breeze.

About the Author(s)

David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.

Avoiding rogue Node.js packages by using good version dependencies in package.json Using HTMLParser2, DOMUtils, to process HTML and XML in Node.js