Tags: MongoDB
The task of generating a sequence of integers, such as identifiers for documents in a MongoDB collection, is conceptually. But, like many seemingly simple coding tasks, it's easy to end up confronting an obtuse error message like "RangeError: Maximum call stack size exceeded".

MongoDB automatically generates an excellent document identifier, the _id
field. It's automatically there, is automatically unique, and is automatically increasing in value. But, my application is implementing an OpenAPI specification where many of the objects use an id
field as an object identifier. To implement that field, my application needed to generate a sequence of integers to put in that field.
If you'll pardon the SQL, this application needed a field defined like id MEDIUMINT NOT NULL AUTO_INCREMENT
(MySQL), but implemented on MongoDB. Unfortunately, MongoDB doesn't have this feature, because it has the _id
field instead.
The requirement boils down to reliably generating a value for id
that's incremented from the current maximum value for id
within the collection. That means retrieving the current maximum value for id
, incrementing it by one, and using that value in the id
field of a new document for the collection.
While it's simple in principle to generate a sequence of integers, putting it into practice has a couple tricky considerations.
The initial implementation and the JavaScript RangeError
The initial simplistic implmentation, in Node.js, is as follows:
const maxIDs = await db.reports.distinct('id');
if (maxIDs.length <= 0) data.id = '0';
else {
const _maxIDs = maxIDs.map(id => {
let ret = Number.parseInt(id);
if (isNaN(ret)) {
ret = id;
}
return ret;
});
data.id = (Math.max(..._maxIDs) + 1).toString();
}
It first uses the distinct
query to get an array of distinct values for the id
field. A special case is when this returns an empty array, to use '0'
. In the else
section we're dealing with the usual case of generating the next value for id
.
Because this particular API specifies id
as a string
, it is stored in the database as a string
, and distinct
gives us an array of strings. The final line is to use Math.max
to determine the maximum value. But, that means we need to convert the array of strings into an array of numbers. The last step is converting the number back to a string.
This simple code is straight-forward, easy to understnad, and ran excelently for several months. Then, it started throwing an error: RangeError: Maximum call stack size exceeded. The stack trace said pointed to the Math.max
call.
To iterate is human, to recurse is divine, unless recursion causes cursing
If you paid attention while taking CS101, you'll remember that the call stack records the parameters to each function call as well as local variables. As one function calls another, a new entry is added to the call stack, and when the function exits its entry is popped from that stack. In any programming language, the call stack has a maximum possible size, and exceeding this size is a serious error.
The most common cause of exceeding the call stack size is a recursive function. A badly coded recursive function can quickly blow past any maximum size for the call stack triggering that error.
But... There's nothing recursive here, right? Not so fast, because looks can be decieving. Turns out that Math.max
can throw this error for an array that is big enough. Its implementation must require some recursion. After a few months, the document collection had grown to over 100,000 documents, making for over 100,000 distinct values for id
. That was enough to throw the error in Math.max
.
Computing the integer sequence with iteration?
The distinct
query gives us an array, and somewhere in that array is the maximum value. We can find that maximum value by iterating over the array (a for
loop) rather than using Math.max
. But, that consumes more CPU time as the array size grows big. A solution relying on iteration, however, will certainly not blow up the call stack.
A solution using sort
rather than a for
loop is this:
const maxIDs = await Reports.distinct('id');
if (maxIDs.length <= 0) data.id = '0';
else {
const compare = (a, b) => {
const nA = Number.parseInt(a);
const nB = Number.parseInt(b);
if (nA < nB) {
return -1;
} else if (nA > nB) {
return 1;
} else { return 0; }
};
// let lastIDs = maxIDs.sort(compare).slice(-10);
// logger.info(`addReport last IDs ${util.inspect(lastIDs)}`);
const maxID = maxIDs.sort(compare).slice(-1)[0];
data.id = (Number.parseInt(maxID) + 1).toString();
}
This is a simple drop-in replacement, using Array.sort
instead of Math.max
. The idea is to sort the distinct values for id
then retrieve the last (or first) item. The bit with .slice(-1)[0]
is a tricky way to retrieve the last element of the sorted array. With this compare function, that last element will be the maximum value in the array. This could be simplified by reversing the sort order (changing the compare
function), in which case the maximum value is the first entry in the array.
The problem here is that as the number of items grows our application is slower. No matter how good the sort
function is, the more items it's sorting the more CPU time it consumes.
Computing the maximum value of an integer series in MongoDB
If we push this to MongoDB, it can theoretically optimize the operation better than the application code. For example, MongoDB could use the index for the id
field to aid in determining the maximum value, or in sorting the elements?
A
StackOverflow has these suggested MongoDB commands:
db.collection.find().sort({ id: -1}).limit(1) // for MAX
db.collection.find().sort({ id: +1}).limit(1) // for MIN
db.collection.aggregate({ $group : { _id: null, max: { $max: "$id" }}});
db.collection.aggregate({ $group : { _id: null, min: { $min: "$id" }}});
The first two use an empty find
to retrieve all elements, sort them on the chosen field, then return one document containing either the maximum or minimum value. Since this retrieves the entire document, it may be useful to use project
to limit the fields which are returned. There should also be a .toArray
to convert this into a useful form.
The second two use an aggregation pipeline. The $group
stage, along with the $max
or $min
operator, will cause the output to be:
[
{
"_id": null,
"max": MAXIMUM VALUE // or min: MINUMUM VALUE
}
]
MongoDB can use the index to improve performance, and it can also recognize the .sort(..).limit(..)
combination to optimize memory usage by keeping only the maximum (or minimum) value(s) in memory.
But, this did not work directly in my case. It always gave a maximum value of '99999'
, even though there were many more documents in that collection.
Turns out that a sequence of integers does not sort in numerical order when the integers are stored as strings.
That's another thing we were taught in CS101, but can easily forget. The string '101234'
is not greater than the string'99999'
. Therefore the '99999'
value always came up as the maximum value. The solution is to somehow numerically sort these numerical strings. Those command work very well if your application uses numbers, rather than strings. Since my application uses numerical strings, let's look a little further.
The compare
function shown earlier takes care of that problem by calling Number.parseInt
before doing the comparison. But the goal is to do this in MongoDB rather than in the application.
Numerically sorting strings in MongoDB
Another
StackOverflow posting discusses telling MongoDB to sort in numerical order. After reading that the final code became:
const last = await db.reports.find()
.sort({ id: -1 })
.collation({
locale: "en_US",
numericOrdering: true
})
.limit(1)
.toArray();
if (!Array.isArray(last)
|| last.length <= 0) {
data.id = '0';
} else {
let lastID = last[0].id;
data.id = (Number.parseInt(lastID) + 1).toString();
}
The collation document lets us tell MongoDB how to sort values (
documentation). The
numericOrdering
setting works for integers, but not floating point, and it does not consider +
or -
signs. As long as your application fits within those constraints, this is an excellent solution.
The StackOverflow posting also has a suggestion to use the $toInt
operator to convert the strings into actual integers, but I could not get that to work.
One downside of the final implementation is that the programmers intent is obscured. To understand that the intention is to generate the next item in the sequence of integers is not obvious. The code is wrapped around the mechanics of numerical sorting, retrieving one item, and incrementing that item, while maintaining the result as a string, that the quest for the next integer in a sequence is hard to see.
Warning: Potential race conditions
There is a timing issue not addressed in this article. We discussed computating the next sequence number for documents in a MongoDB collection. This relies on values of an id
field in the collection. The value just generated by your application must be stored into the collection in order to be recognized. There are two points in time: A) Computing the next sequence number, B) Storing that sequence number. Another thread might try to generate another sequence number at about the same time, and if it does so between time A and time B it will generate the a duplicate sequence number.
Therefore, your application must minimize the time between generating the sequence number, and storing it into the collection.
In my application, the code above is immediately followed by db.reports.insertOne(report)
. That leaves a small window of time during which a sequence number has been generated and the generated number is stored in the database.
Conclusion
As with many software development tasks, what seems like a simple task comes with many considerations. Even the tiniest and simplest of functions can generate errors.
Another learning is that MongoDB has a rich set of features and operations worth exploring. It's not just a repository for data, but it can be used for transmogrifying data in many ways.
Functionality that was first implemented almost solely in the application code is now almost solely executing on MongoDB. The MongoDB documentation claims many advantages from moving implementation into MongoDB. While in this case the final implementation obscured the programmers intent, aggregation pipelines often make the programmers intent in a large scale data transmogrification task more obvious.