Tags: Node.JS »»»» Copy on Write
It's been refreshing to learn of an advance in Linux/Unix file system semantics over the last 35 years. In 1984 I first used symbolic links, on a Vax running 4.2BSD. That was a significant advance over the hard links that had existed for many years before. The other day I learned that both Mac OS X and Linux support a new kind of link, a reflink, that is a form of copy-on-write for certain file systems.
If like me you're not real clear on the meaning of Copy on Write, let's explore that first. I vaguely recall CoW being used in operating system memory management systems - that it's used with shared memory segments.
Copy-on-write (CoW or COW) is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred to the first write. By sharing resources in this way, it is possible to significantly reduce the resource consumption of unmodified copies, while adding a small overhead to resource-modifying operations. Source: Wikipedia
What this means is you have a resource, like a shared memory segment, or a file in the file system. Instead of copying the resource, you make a duplicate, that can be very quick. The duplicate is set up so that any modifications to the resource cause the modified portion to become a copy, and the copied portion is what is modified.
Maybe that attempt to explain was still murky, so let's try some explicit examples.
To create a reflink clone in Node.js, the fs.copyFile
function is used with a specific option. Source code for this is at the bottom of this article. To understand what this is about we must first go over some background material.
To create a reflink copy/duplicate of a file:
$ time cp --reflink wikidatawiki-stub-articles.xml w.xml
real 0m0.008s
user 0m0.004s
sys 0m0.000s
$ ls -l wikidatawiki-stub-articles.xml w.xml
-rw-rw-r-- 1 david david 39478588202 Jul 21 22:00 wikidatawiki-stub-articles.xml
-rw-rw-r-- 1 david david 39478588202 Jul 25 22:00 w.xml
The file in question is 39 gigabytes. This is on a Linux box, with an XFS file system where the reflinks option was enabled. For information on setup How to format a drive on Ubuntu with the XFS file system and reflinks support
On Mac OS X the command would instead be cp -c
to use the clonefile
system call.
To clone this 39 gigabyte file took a tiny fraction of a second.
Think about that - 39 gigabytes of data duplicated in the blink of an eye. I know from having copied this file several times that a normal file copy of 39 gigabytes requires 15 minutes. Fifteen minutes versus a fraction of a second is a huge speedup.
Since it's not practical to edit a 39 gigabyte XML file, let's demonstrate copy-on-write using a smaller file.
$ ls -l sample.text
-rw-rw-r-- 1 david david 1393 Jul 25 22:08 sample.text
This is a file with Lorem Ipsum text.
$ cp --reflink sample.text sample-dup.text
$ vi sample-dup.text
$ ls -l sample*
-rw-rw-r-- 1 david david 1403 Jul 25 22:09 sample-dup.text
-rw-rw-r-- 1 david david 1393 Jul 25 22:08 sample.text
$ diff -u sample.text sample-dup.text
--- sample.text 2019-07-25 22:08:43.809680598 -0700
+++ sample-dup.text 2019-07-25 22:09:24.073194632 -0700
@@ -3,3 +3,5 @@
Etiam tempor orci eu lobortis elementum nibh tellus molestie. Neque egestas congue quisque egestas. Egestas integer eget aliquet nibh praesent tristique. Vulputate mi sit amet mauris. Sodales neque sodales ut etiam sit. Dignissim suspendisse in est ante in. Volutpat commodo sed egestas egestas. Felis donec et odio pellentesque diam. Pharetra vel turpis nunc eget lorem dolor sed viverra. Porta nibh venenatis cras sed felis eget. Aliquam ultrices sagittis orci a. Dignissim diam quis enim lobortis. Aliquet porttitor lacus luctus accumsan. Dignissim convallis aenean et tortor at risus viverra adipiscing at.
+MODIFIED
+
We created a reflink duplicate of the file, then edited the duplicate. We see that the two files are different size, and that diff
shows a difference between them. Therefore sample-dup.text
is now a copy of sample.text
.
If these files had been hard linked or symlinked instead, creating the link would have been as fast, but editing the file would have modified the link.
$ cp sample.text sample2.text
$ ln sample2.text sample2-link.text
$ ls -l sample2*
-rw-rw-r-- 2 david david 1393 Jul 25 22:50 sample2-link.text
-rw-rw-r-- 2 david david 1393 Jul 25 22:50 sample2.text
$ vi sample2-link.text
$ ls -l sample2*
-rw-rw-r-- 2 david david 1402 Jul 25 22:50 sample2-link.text
-rw-rw-r-- 2 david david 1402 Jul 25 22:50 sample2.text
$ diff -u sample2*
To demonstrate the obvious, we made a copy of the file. Then we made a hard link to the copy, and edited the hard linked copy. Of course, because this is how hard links work, the modification shows up for both files.
What this means?
What we've demonstrated is that with reflinks what's effectively a copy of a file can be created very quickly, and consume a negligible amount of disk space. The cloned copy continues to consume negligible disk space until you modify the cloned copy, at which point it becomes a regular copy of the file.
Maybe this seems arcane but consider one possible use case.
A software development project couple be managed with this capability. Instead of a source code management system (Git, Mercurial, CVS, etc) you'd have parallel directories. With the GNU cp program we use on Linux you can clone a whole directory structure very quick. A project team could use a series of parallel directories each using copy-on-write duplicates to manage the source tree.
$ git clone https://github.com/nodejs/node.git
Cloning into 'node'...
remote: Enumerating objects: 8, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 521860 (delta 1), reused 3 (delta 1), pack-reused 521852
Receiving objects: 100% (521860/521860), 462.85 MiB | 1013.00 KiB/s, done.
Resolving deltas: 100% (387209/387209), done.
Checking out files: 100% (31537/31537), done.
$ du -sk node
913584 node
$ find node -type f -print | wc -l
31561
$ time cp --archive --reflink node node-dup
real 0m1.860s
user 0m0.317s
sys 0m1.306s
Using the Node.js Git repository as an example, 913 Megabytes of source code, over 31,000 files, and the reflink clone of the entire directory structure takes 1.8 seconds. In that clone we could edit any file, and the original would not be changed.
In theory we could be using this to handle source tree revisions. Of course I don't envision software engineers giving up SCM systems like Git. Instead it's likely this particular example is not so good, but maybe there are other use cases that are more compelling.
For example what if a word processing or image manipulation program created a Copy-on-Write clone every time you edit a file? The program could have a UI to navigate through the clones so you could revert to an earlier version of the file if needed.
Operating system support
The reflinks/clonefile feature is not available for every operating system.
On Mac OS X the feature requires the APFS file system.
On Linux it requires XFS, BTRFS, and one or two other file systems.
On Windows -- I have no idea.
Implementing reflinks in Node.js
const fs = require("fs");
const process = require('process');
fs.copyFile(
process.argv[2],
process.argv[3],
fs.constants.COPYFILE_FICLONE,
(err) => {
if (err) {
// TODO: handle error
console.error(err);
}
}
);
In production code this should of course be inside an async/await function. The key is to use copyFile
with the fs.constants.COPYFILE_FICLONE
constant.
The timing is:
$ time node reflink.js wikidatawiki-stub-articles.xml foo.xml
real 0m0.047s
user 0m0.036s
sys 0m0.012s
Or ... about the same time as required by cp --reflink
.