Easily export or import data between CSV files and SQLite3 or MySQL databases with Node.js

; Date: Sat Feb 27 2021

Tags: Node.JS »»»» Node.js Database »»»» Node Web Development »»»» MySQL

It's convenient to import a CSV file into a spreadsheet to crunch numbers. CSV's (comma-separated-values) and it's brethren (TSV, for tab-separated, etc) are widely used to deliver information. You'll see CSV downloads available all over the place, whether it's your stock broker, a bank, or a government agency, and on and on. Spreadsheet programs like LibreOffice Calc offer direct import of these files, and then as a spreadsheet you've got a zillion tools available to crunch numbers, make graphs, etc. Sometimes, though, you need to load the CSV into a database to do other sorts of work. That's what we'll look at, the twin tasks of autoloading a CSV file into a MySQL database, and of exporting data from MySQL into a CSV file.

The original inspiration for this came while working on a project to report on electricity production from a solar array. Another team in our company maintained a system using sensors to record data in the field at client sites, which was transmitted over the Internet to our servers, and stored as both CSV files and in a database. My team produced a second database derived from data in the first database. My task was to analyze data accuracy between the two databases. Bottom line was I needed a fast way to load CSV files into a database. The CSV files had a different set of columns for each client site, and the need was to load any CSV quickly without having to first define a schema.

Quickly loading a CSV file into a database of course requires reading each row of the CSV, converting each into an INSERT INTO command. But you first need a schema for the database table. In a more formal project you might formally analyze your needs, and create a database schema. But in a one-off situation you just want to load the data to play with it. The answer is to retrieve from the CSV file the column names, converting those names into a CREATE TABLE statement describing the schema.

This is implemented as a Node.js script using ES6 module syntax. Async/Await keywords are used at the top-level of the script, requiring that you use Node.js 14.8 or later. That release contained a new feature allowing use of async and await keywords at the top level of the script. Node.js Script writers: Top-level async/await now available

The result demonstrates examining a CSV file to get the column header declaration, and then automatically generating a database schema, and finally reading the CSV file to insert its data into a database. The script automagically loads a CSV into a database, and you don't even have to run a CREATE TABLE command. Instead, the CREATE TABLE command is automatically constructed, as are the INSERT INTO commands.

At the bottom of this we'll walk through a companion script that takes an SQL query, and produces a CSV from the results.

Before reading this tutorial it's recommended to read a companion tutorial on reading and processing CSV files in Node.js. Memory-efficient CSV transformation into CSV or text reports in Node.js

The source code is available on gitlab: (gitlab.com) gitlab.com robogeek csv2db

This tutorial is updated from an earlier posting written in 2016.

Let's start with an SQLite3 database since we can prototype the code without setting up a database server. The MySQL version will be nearly identical.

Project initialization

You can inspect the project code from the Gitlab repository. However, let's talk about setting this up from scratch:

$ mkdir csv2db
$ cd csv2db
$ npm init -y
$ npm install csv-headers csv mysql sqlite3 --save

This creates a project directory, creates a package.json, and installs the packages required for this project. We make sure to save package references in package.json so we can reconstruct this using npm install at any time.

Loading data from a CSV into a SQLite3 database

In this section we'll walk through a script to support loading a CSV file into an SQLite3 database. Create a file named csv2sqlite3.mjs:

import parse from 'csv-parse';
import fs from 'fs';
import sqlite3 from 'sqlite3';
import csvHeaders from 'csv-headers';

const dbfn  = process.argv[2];
const tblnm = process.argv[3];
const csvfn = process.argv[4];

Here we're getting command line arguments - the filename for the CSV, the database name, and the name of the table to create.

const headers = await new Promise((resolve, reject) => {
    csvHeaders({
        file      : csvfn,
        delimiter : ','
    }, function(err, headers) {
        if (err) reject(err);
        else {
            resolve(headers.map(hdr => {
                // Sometimes header names are :- "Long Header Name"
                // We need to remove the "-marks
                return hdr.replace(/["]/g, '');
            }));
        }
    });
});

The csv-headers package reads a CSV file to get the headers line, which contains the column names for the file. That's all it does, and it appears the package has not required any update in many years. Since it was designed before async/await functions, it still uses the callback paradigm, and therefore we have to wrap it in this Promise object.

What we'll get is an array containing whatever names are in the first row of the CSV. Those names might not be directly suitable as a field name in an SQL database. Namely, a header name with spaces in it, like Display Name, is rendered with quote marks, like "Display Name". It is best to remove the quote marks, which we do here.

const db = new sqlite3.Database(dbfn);
db.on('error', err => { 
    console.log(err);
    process.exit(-1);
});
// db.on('trace', sql => {
//     console.log(sql);
// });

await new Promise((resolve, reject) => {
    db.run(`DROP TABLE IF EXISTS ${tblnm}`,
    [ ],
    err => {
        if (err) reject(err);
        else resolve();
    })
});

In this step we connect to the MySQL database. And, we delete the table if it exists.

The SQLite3 module doesn't support returning a Promise, so we'll be wrapping many function calls in a Promise object like this.

The database connection is set up with two useful event listeners. The first, error, handles any error by printing a message then exiting. For a simple script like this that's the appropriate action. The second, trace, prints every SQL statement that's executed, giving a useful trace of activity if you need it.


const fixfieldnm = (nm) => { return nm.replace(/[ "]/g, '_'); }

// Convert header names into names suitable for database schema
const fields = headers.map(hdr => {
    return fixfieldnm(hdr);
});
// console.log(fields);

// Convert fields array into one with TEXT data type
const schema = headers.map(field => {
    return `${fixfieldnm(field)} TEXT`;
});
// console.log(schema);
// console.log(schema.join(', '));

// Generate an array of question markes for use in prepared statements
const questionMarks = headers.map(field => {
    return '?';
});
// console.log(questionMarks);
// console.log(questionMarks.join(', '));

Here we create three variables derived from the CSV header values. The first, fields, converts the header names into names suitable as database field names. Column names in CSV files might contain characters that cannot be used as an SQL column name. This function converts some characters seen in testing into the innocuous _ character.

The second, schema, converts that into another array field names followed by the TEXT data type. The last, questionMarks, is a string that will be used with prepared statements.

Some console.log statements are here for debugging purposes. Later we'll be converting these arrays into a string, with comma's separating entries in the strings. For that we'll use the join function.

await new Promise((resolve, reject) => {
    // console.log(`about to create CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`);
    db.run(`CREATE TABLE IF NOT EXISTS ${tblnm} ( ${schema.join(', ')} )`,
    [ ],
    err => {
        if (err) reject(err);
        else resolve();
    })
});

In this step we process the CSV headers and construct a CREATE TABLE command. This uses the schema array to assign the TEXT data type to each column.

As we go, a couple other arrays are created which are useful in later steps.

// Read the CSV file using Node.js streaming
const parser = fs.createReadStream(csvfn).pipe(parse({
    delimiter: ',',
    columns: true,
    relax_column_count: true
}));

// Receive each row of the CSV file, insert into database
for await (const row of parser) {

    // Possibly there is custom processing required,
    // For example you might know of certain fields, and
    // of certain constraints.  In this case we might know
    // that certain records are to be skipped.
    if (row['Email'] === '(Not disclosed)') continue;

    let d = [];
    headers.forEach(hdr => {
        d.push(row[hdr]);
    });
    await new Promise((resolve, reject) => {
        db.run(`INSERT INTO ${tblnm} ( ${fields.join(', ')} )
                VALUES ( ${questionMarks.join(', ')} )`, d,
        err => {
            if (err) {
                console.log(`${fields.join(', ')}`, d);
                console.error(err);
                /* reject(err); */ resolve();
            }
            else resolve();
        });
    });
}

db.close(); 

This is where we actually load data into the database. The core is using the INSERT INTO command, suitably configured, to add data to the table.

The first step using the csv-parse module to read the CSV file. To learn more about this see Memory-efficient CSV transformation into CSV or text reports in Node.js

As configured here the parser object expects to find a comma-separated file, and it will accommodate the first line containing column names. It is a Node.js stream, making it possible to use this in several ways. As shown here we use an async-friendly for await (..of..) loop. Each loop body invocation corresponds to one line in the CSV file.

You may want to do custom processing of the data before adding it into the database. For example, you may know of certain data patterns to not add to the database. As shown here, simply use continue to skip adding records.

What arrives in the loop body is an array indexed by the field name supported by the CSV file. The first thing in the loop body is stepping through the field names, to get the data for each column, appending that data into the array named d. The next part takes that data, and the field names, and constructs an INSERT INTO command.

In other words there are three arrays with values that are supposed to correlate with each other: the field names, the question marks, and the field data. Each are supplied to this statement.

To run this script:

$ node  ./csv2sqlite3.mjs data-base-name.sqlite3 table-name /path/to/file-name.csv

After a short time the database file will be created. To inspect the database you need the SQLite3 command-line tools installed on your computer. With them you run this:

$ sqlite3 data-base-name.sqlite3
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> 

Which gets you into an interactive command shell. Typing .schema (note the period at the beginning of that command) prints out the definition of all tables in the database. Your database will contain fields corresponding to the column names. Typing SELECT * FROM table-name will print out the data.

Now that we've validated the basic concept, let's move on to the MySQL version of this script.

Using Docker to set up a MySQL server

To proceed you must have a MySQL server running. For more information see: Set up PHPMyAdmin and Wordpress Docker containers using existing MySQL

Assuming you have Docker setup on your laptop, run this:

$ docker run -d --name=mysql --env MYSQL_ROOT_PASSWORD=f00bar \
        --env MYSQL_USER=me --env MYSQL_PASSWORD=passw0rd \
        --env MYSQL_DATABASE=csv-database \
        --volume `pwd`/mysql-data:/var/lib/mysql \
        -p 3306:3306 \
        mysql/mysql-server:5.7

This initializes a MySQL database instance, using MySQL version 5.7, with a table named csv-database that is accessible from the user me with password passw0rd. The data directory is on your laptop at mysql-data in your current directory. The important thing is it exports port 3306, the standard MySQL port, to your laptop so that software on your laptop can access the database.

MySQL 5.7 is an older release at this time, however using the MySQL package with MySQL v8 gave an error message. It was easier to downgrade to MySQL 5.7 than to work out the fix for that error.

To test the database, wait for a minute or so, then run this:

$ docker exec -it mysql mysql -u root -h localhost -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
...

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| csv-database       |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.04 sec)

mysql>

This lets you verify that the database is running, and you'll be able to access its contents.

Loading data from a CSV file into a MySQL database

We'll find that the code for MySQL is very similar to the SQLite3 code we just wrote. Run this command:

$ cp csv2sqlite3.mjs csv2mysql.mjs

At the top, import the MySQL package:

import mysql from 'mysql';

This replaces the existing SQLite3 import.

Then replace the code to look for command-line arguments:

const dbhost = process.argv[2];
const dbuser = process.argv[3];
const dbpass = process.argv[4];
const dbname = process.argv[5];
const tblnm  = process.argv[6];
const csvfn  = process.argv[7];

Connecting to MySQL requires several more parameters.

Then replace the code for connecting to the database:

const db = mysql.createConnection({
    host     : dbhost,
    user     : dbuser,
    password : dbpass,
    database : dbname,
    // debug    : true
});
db.on('error', err => { 
    console.log(err);
    process.exit(-1);
});

await new Promise((resolve, reject) => {
    db.connect((err) => {
        if (err) {
            console.error('error connecting: ' + err.stack);
            reject(err);
        } else {
            resolve();
        }
    });
});

There's a little bit of difference in creating the connection, but conceptually it's the same. Primarily we pass the connection parameters, and other configuration attributes, to the createConnection function. The debug option prints out protocol-level data, if you need to see that.

Then at the bottom:

db.end();

The MySQL driver uses db.end to close the database connection rather than db.close.

The last change to make is for every instance of db.run, change it to db.query. The MySQL package uses db.query instead of db.run. The behavior is the same between both, just a different function name.

Everything else remains the same. Both the SQLite3 and MySQL packages follow the same pattern, and have very similar API's.

To use the script to load a CSV into the database, run this:

$ node ./csv2mysql.mjs localhost me passw0rd \
        csv-database members \
        /path/to/file-name.csv 

These command line arguments match the code we wrote. With this many options it's useful to consider a package like Commander to professionally parse command line options, but I digress.

To check the database, log in and run these commands:

mysql> use csv-database;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+------------------------+
| Tables_in_csv-database |
+------------------------+
| members                |
+------------------------+
1 row in set (0.00 sec)

mysql> select * from members;
...

Exporting a CSV file from an SQLite3 database

We've developed a fairly powerful script to import data from a CSV into an SQL database. Now let's look at the opposite, exporting data as CSV from an SQL database.

The model we'll use is to supply an SQL query, then generate a CSV file containing the columns in the SQL query. Each row in the result set will be a row in the CSV, and the columns will correspond to whatever is in the SELECT command.

To prototype the script, let's start with the SQLite3 database.

Create a file named sqlite2csv.mjs containing:

import stringify from 'csv-stringify';
import sqlite3 from 'sqlite3';

const dbfn    = process.argv[2];
const dbquery = process.argv[3];

const db = new sqlite3.Database(dbfn);
db.on('error', err => { 
    console.log(err);
    process.exit(-1);
});
// db.on('trace', sql => {
//     console.log(sql);
// });

let stringifier;
let columns;

await new Promise((resolve, reject) => {
    db.each(dbquery, function(err, row) {
        // This function is executed for each row in the result set.
        // For the first invocation, we initialize the columns and stringifier.
        // For every invocation, we write the row object to the stringifier.
        // The row object is a simple object where the fields match the columns
        // specified in the query.
        if (err) { reject(err); }
        if (!columns) {
            columns = Object.keys(row);
        }
        if (!stringifier) {
            stringifier = stringify({
                delimiter: ',',
                header: true,
                columns: columns
            });
            stringifier.pipe(process.stdout);
        }
        // console.log(row);
        stringifier.write(row);
    }, function(err, count) {
        // This is invoked when all rows are processed.
        if (err) { reject(err); }
        // console.log(`FINISHED ${count} rows`);
        stringifier.end();
        resolve();
    })
});

db.close(); 

With SQLite3 the db.each function takes an SQL command, and two callbacks. The first is called for each row in the result set, and the second is called when all rows have been processed. This is preferable over the db.all function which first loads all rows in the result set into memory before invoking the callback. Using db.each handles large result sets with ease, while db.all risks an out of memory error.

The next question is how do we determine the columns of the CSV file? It's easiest if they correspond to the columns of the SQL query. In the case of the db.each function there is no method to independently get that list of columns. Instead we must determine this in the first invocation of the callback function.

There is also no method to determine the correct order of the columns. What we receive is a simple object. The fields of the object do correspond to fields from the SELECT statement, but the ordering of the columns is not deterministic.

On the first invocation both the columns and stringifier variables are undefined. That lets us initialize them on the first invocation, then for each subsequent invocation to skip the initialization. The Object.keys function simply gives us the field names of the object, which corresponds to the column names as we just discussed. That makes this array suitable for supplying column names to the Stringifier.

The Stringifier is configured to construct a CSV, using the column names determined by calling Object.keys. The CSV output is streamed to the standard output.

On every invocation, stringifier.write is called so that it see's every row in the result set.

The second callback is invoked when all rows have been processed. In this case we simply need to call resolve so that the Promise wrapped around this knows to finalize itself.

To execute the script:

$ node ./sqlite2csv.mjs members.sqlite3 \
        'SELECT * FROM members WHERE Email != "(Not disclosed)"' 

We can theoretically put any SQL statement here, so feel free to experiment. This will primarily be used with SELECT statements, but of course those can become quite complex with INNER JOIN clauses and everything else. But what's important here is that columns of the result set are determined by the SELECT statement, and not by any table.

Exporting a CSV file from a MySQL database

Now that we've prototyped a script using the SQLite3 database, let's see how to do the same using the MySQL database. The structure is roughly the same, but the MySQL database driver gives us a different option for streaming through the result set.

Create a file named mysql2csv.mjs containing:


import stringify from 'csv-stringify';
import mysql from 'mysql';

const dbhost  = process.argv[2];
const dbuser  = process.argv[3];
const dbpass  = process.argv[4];
const dbname  = process.argv[5];
const dbquery = process.argv[6];

const db = mysql.createConnection({
    host     : dbhost,
    user     : dbuser,
    password : dbpass,
    database : dbname,
    // debug    : true
});
db.on('error', err => {
    console.log(err);
    process.exit(-1);
});

await new Promise((resolve, reject) => {
    db.connect((err) => {
        if (err) {
            console.error('error connecting: ' + err.stack);
            reject(err);
        } else {
            resolve();
        }
    });
});

This sets up the database connection particulars, and connects to the database.

let stringifier;
let columns;
const query = db.query(dbquery);
query
  .on('error', function(err) {
    // Handle error, an 'end' event will be emitted after this as well
    console.log(err);
    process.exit(-1);
  })
  .on('fields', function(fields) {
    // the field packets for the rows to follow

    columns = fields.map(field => field.name);
    stringifier = stringify({
        delimiter: ',',
        header: true,
        columns: columns
    });
    stringifier.pipe(process.stdout);

  })
  .on('result', function(row) {
    db.pause();
    stringifier.write(row);
    db.resume();
  })
  .on('end', function() {
    // all rows have been received
    stringifier.end();
    db.end(); 
  });

This handles making the query, and streaming through the results. Instead of using multiple callbacks, we get an object that emits events, to which we can attach event handlers.

The fields event gives us the field names, and an opportunity to construct the Stringifier. We've configured it the same as in the SQLite3 script.

The result event is where we receive each row in the result set. The row parameter is a simple object containing data for the row, with each field corresponding to a column of the result set. That lets us call stringifier.write to send the data to the Stringifier. The db.pause and db.resume calls handle throttling the incoming data so that the script is not overwhelmed.

Once every row is processed the end event is emitted, in which we close both the Stringifier and the database connection.

To execute the script:

$ node ./mysql2csv.mjs localhost me passw0rd csv-database \
        'SELECT Display_Name as Name, Email FROM members WHERE Email = "(Not disclosed)"' 

Connecting to an MySQL database requires more parameters than for SQLite3, of course. What we said earlier about SELECT statements also applies here. We demonstrate here that you can of course change the name of an output column using an AS clause, because this is SQL and you're free to do so.

Summary

In this tutorial we've learned how to easily import CSV data into SQL databases, and to export CSV data from SQL databases. While we focused on SQLite3 and MySQL, other SQL databases have similar database drivers. These scripts should be applicable to other databases as a result.

In practice you may want to customize these scripts for the specific needs of a particular import or export task. You may need to synthesize columns, for example, or to retrieve data from other sources to include with the data being processed.