Tags: Node.JS »»»» Asynchronous Programming »»»» JavaScript
Asynchronous coding is one of those powerful tools that can bite, if you're not careful. Passing around anonymous so-called "callback" functions is easy, and we do it all the time in JavaScript. Asynchronous callback functions are called, when needed, as needed, sometime in the future. The key result is that code execution is out-of-order with the order-of-appearance in the code.
NOTE: This is an excerpt from Asynchronous JavaScript with Promises, Generators and async/await, a new book that deeply studies asynchronous programming techniques in JavaScript
For years, JavaScript's primary use-case has been event handlers in web pages, a form of asynchronous coding. The term AJAX describes asynchronous JavaScript execution associated with retrieving data from a server. In Node.js the whole runtime library is built around asynchronous callbacks invoked by asynchronous events. It is a powerful model, indeed, but there's a slippery slope down which it's easy to slide into a tangled mess of unmaintainable code.
We must write asynchronous code in JavaScript because of its single-thread execution model. In browser-side JavaScript, we write event handlers for events like button clicks. In server-side Node.js, we write event handlers for events like a GET request on a URL. In both cases the event handler executes in response to an event. Because it is a single threaded environment, that event handler must execute quickly so that the event loop can continue handling other events. Event handlers that do not return quickly are said to block the event loop, and produce a bad user experience.
In both flavors of JavaScript, the traditional way to write an asynchronous event handler is with an anonymous function. We start out by writing benign code like these snippets:
// Browser
$('.day').click(function() {
window.location = "com_day_graph";
});
// Node.js
const fs = require('fs');
...
fs.readFile('example.txt', 'utf8', function(err, txt) {
if (err) report the error;
else {
do something with the text
}
});
Each of those are pretty simple, and the idea is straight-forward. You pass an anonymous function to another function, so that function can call back to your code, hence the name "callback function".
As your your programming confidence grows you do more and more with callbacks. You might need to tack a new bit of functionality at the end of an existing event handler, requiring a new callback function. Or processing a certain task may require several steps, all of which are asynchronous. Sooner-or-later you end up with Callback Hell or what some call the Pyramid of Doom.
The first circle of Callback Hell is entered when your callback function needs to invoke another callback function, and you think it's easier to implement the second callback function right there, nested inside the first. JavaScript is perfectly fine with nesting functions inside other functions. We do it all the time.
Then one day you must write a task where, before reading a file you must first verify the directory exists (using fs.stat
), then read the directory (fs.readdir
), look through the file list for the one or more files to read, check that each is readable (fs.stat
again), and finally read each file (fs.readFile
), and then do something with the file content like store it in a database (db.query
). Each stage of that means a callback function, and might look like this:
fs.stat(dirname, function(err, stats) {
if (err) handle error;
else if (!stats.isDirectory()) {
throw an error that `${dirname} is not a directory`
} else {
fs.readdir(dirname, function(err, filez) {
if (err) handle error;
else {
async.eachSeries(filez,
function (filenm, next) {
fs.stat(path.join(dirname, filenm),
function(err, stats) {
if (err) next(err);
else {
// check stats
fs.readFile(path.join(dirname, filenm),
'utf8', function(err, text) {
if (err) next(err);
else {
db.query('INSERT INTO ... ', [ text ],
function(err, results) {
if (err) next(err);
else {
do something with results;
Indicate success;
next();
}
});
}
});
}
});
},
function(err) {
if (err) {
console.error(err);
do something else;
} else {
do what;
}
});
}
});
}
});
This can get quite daunting. Because the fs.stat
and fs.readFile
and db.query
operations are asynchronous, we cannot throw a simple for loop around the middle portion of this. We need a looping construct which can handle asynchronous execution. That's what async.eachSeries
does, is to implement something like a while
loop in asynchronously executed callback functions.
Another problem with this example is that we can easily get lost in error handlers and making sure the results bubble up to the right place. When there is this much error handling code, a common error is to miss implementing an error handler.
Asynchronous callback functions obscure the programmers intent
This example is essentially a sequence of steps. But because of the asynchronous callback functions, we've lost track of that sequence while working on the intricacies of correctly nesting everything. The code is not written as a series of steps, and therefore the structure is all wrong for what your code describes. While it works, it's all sideways and kerflunky. Since it's a sequence of steps, shouldn't it be written as a sequence in your code?
This is the kind of code that works - but you don't want to touch it, and you hope that no bugs will be reported against it. But what if the Marketing Department wants a new feature -- that every Thursday the code must insert tributes to the Norse God Thor because, well, Thursday is Thor's Day? Then a few weeks later they want Friday's tribute to be to fish Friar's .. because. In other words, this code example is already is already a careful balancing act, and adding (or subtracting) anything is risky. Even fixing a bug is risky.
The generalized problem is this:
db.query('SELECT * FROM foobar', function(err, results) {
if (err) {
// errors arrive here
} else {
// data arrives here
}
});
// While we'd prefer both errors and data to arrive here
// The data cannot arrive here because it's trapped inside the inner namespace
The very design of asynchronous coding in JavaScript leads us in an inconvenient direction. Errors and data inconveniently end up in the wrong place. To compensate the code inevitably takes on that pyramid shape.
But, our intention is no longer clear. The intent is a simple sequence of steps, but because of the vagaries of nested callback functions that intention is obscured.
Using the Async package to improve asynchronous code in JavaScript
In the current era of JavaScript, async functions make this easy. But before async functions we had libraries that made it slightly easier to manage complex asynchronous code scenarios like this one. One, the Async package (
https://www.npmjs.com/package/async), has a large variety of asynchronous execution constructs. The async.eachSeries
example shown earlier is one such construction, and it acts like a while
loop executing the iteration function once for each entry in the provided array.
Over the years programmers developed various strategies to deal with asynchronous coding in JavaScript. The Async package is a well-designed library with useful, well tested, asynchronous programming constructs one can easily use. The solution is still using asynchronous callback functions, but there is a slight gain in clarity of purpose.
As an example, consider looping over a data structure with an asynchronous operation on each element. With a simple normal for
or while
loop this is impossible, when using callback functions, because the loop doesn't synchronize with asynchronous code. In the Async package, loops are expressed as library function calls with several callback functions representing different stages of loop execution.
For example an asynchronous do {..} while()
loop structure might look like this:
async.doWhilst(
next => {
... asynchronous operation
next(); // call next when done
},
() => {
if (test whether to continue) return false;
else return true;
},
err => {
if (err) console.error(err.stack);
else {
.. do something with result
}
}
);
While this is orderly, the intention is still obscured. Instead of the simple loop structure everyone knows, we have a function call. The reader must know the library in order to interpret this function call, and to understand that it implements a loop.
Async functions introduced into JavaScript after ES-2015/2016
Starting with ES-2015, the JavaScript language was reworked with many advanced features. One of these is async functions and the await keyword. With this it is easier to write asynchronous JavaScript code that's more reliable than older techniques. In this book we'll explore these new features, and how they solve (er... mitigate) these problems.
Promise Chains were the first stage solution to asynchronous coding in JavaScript
The Promise object came with ES-2015 and goes a long way towards simplifying most asynchronous patterns. Using Promises, it is easy to code a "Promise Chain" that is relatively flat, is written as a series of steps, and much easier to understand. You're still left with the problem of errors and data arriving in places other than where they belong. But, the code is more orderly and easier to understand.
Here's an attempt to convert the pyramid shown earlier into a Promise chain. I don't guarantee complete accuracy, nor do I guarantee correctness. It's been several years since I wrote a complex Promise chain, and I'm rusty.
new Promise((resolve, reject) => {
fs.stat(dirname, function(err, stats) {
if (err) reject(err);
else resolve(stats);
});
})
.then(stats => {
if (!stats.isDirectory()) throw new Error(`${dirname} is not a directory`);
fs.readdir(dirname, function(err, filez) {
if (err) handle error;
return filez;
});
})
.then(filez => {
return Promise.all(filez.map(file => {
return new Promise((resolve, reject) => {
fs.stat(path.join(dirname, filenm), function(err, stats) {
if (err) reject(err);
else return { file, stats };
});
});
}));
})
.then(filestats => {
return Promise.all(filestats.map(filestat => {
return new Promise((resolve, reject) => {
const filenm = filestat.file;
const stats = filestat.stats;
if (stats says it does not exist) {
return reject(`File ${filenm} does not exist`);
}
fs.readFile(path.join(dirname, filenm), 'utf8', function(err, text) {
if (err) reject(err);
else resolve({ filenm, text });
});
});
}));
})
.then(filetexts => {
return Promise.all(filetexts.map(filetext => {
return new Promise((resolve, reject) => {
db.query('INSERT INTO ... ', [ text ], function(err, results) {
if (err) reject(err);
else {
do something with results;
Indicate success;
resolve();
}
});
});
}));
})
.catch(err => {
Deal with whatever errors occurred
});
This is possibly an improvement over the pyramid structure shown earlier. A few years ago, before async functions were available in Node.js, this felt like the best choice. The best we can say about it is that it's flatter than the pyramid of doom shown earlier. But, it is still complex write and debug. More importantly the programmers intention is less obscured, but is still obscured.
Another ES-2015 feature, Generators, offered a way forward. When used directly, Generator functions are complex. But there was a library, the co
library, which performed asynchronous magic allowing us to write clear code which matches our intention. This library gave us a foretaste of what async functions would be like.
JavaScript async functions are an excellent solution for asynchronous operations
The async function is a type of function which lets us use the await keyword to manage asynchronous execution. Behind the scenes there are Generator functions and Promises, with a lot of magic happening in the handling of the await keyword. It is very important to understand the Promise object, but not so necessary to understand Generator functions.
We start by prepending function declarations with the async keyword. This tells JavaScript to support the await keyword inside the function. Such async functions return a Promise.
Inside an async
function the await
keyword lets you easily write code which appears to be synchronous, while under the cover Promise objects are used to manage asynchronous code execution. The result is an amazing breath of fresh air.
It is a huge improvement over the older callback-oriented paradigm. It's such a big improvement that the whole JavaScript ecosystem should move to the Promise-based async function paradigm. This new paradigm makes code easier to read, improves error handling and helps to get rid of nasty “callback trees”. It's not a panacea, of course, and comes with some baggage of its own, but the Promise-based paradigm is so compelling it's worth the effort to make the change.
The earlier example could be rewritten as so:
const fs = require('fs/promises');
...
async function doAsynchronously(dirname) {
const stats = await fs.stat(dirname);
if (!stats.isDirectory()) {
throw new Error("Not a directory: "+ dirname);
}
const filez = await fs.readdir(dirname);
for (var filenm of filez) {
const filestats = await fs.stat(path.join(dirname, filenm));
if (!stats.isFile()) {
// maybe instead, log an error and continue
throw new Error("Not a file: "+ filenm);
}
const text = await fs.readFile(path.join(dirname, filenm), 'utf8');
await new Promise((resolve, reject) => {
db.query('INSERT INTO ... ', [ text ],
function(err, results) {
if (err) reject(err);
else {
do something with results;
Indicate success;
resolve();
}
});
});
}
}
What a breath of fresh air!
Gone is the complex pyramid structure, and the equally complex Promise chain. Our intention is very clear. Errors and results are both handled naturally. The only issue is that we're assuming db.query
does not support returning a Promise, and therefore we are wrapping it with a Promise object. If instead it returned a Promise, we could use await db.query
to handle it more naturally.
This landscape is discussed in a new book: Asynchronous JavaScript with Promises, Generators and async/await. The book goes deeply into several ways to tame asynchronous coding in JavaScript. The above is an excerpt from the first chapter.