Easily import your CSV data into MySQL with Node.js

By: ; Date: 2016-09-16 10:59

Tags: Node.JS » Node.js Database » Node Web Development » MySQL

It's convenient to import a CSV file into a spreadsheet to crunch numbers. CSV's (comma-seperated-values) and it's brethren (TSV, for tab-separated, etc) are widely used to deliver information. You'll see CSV downloads available all over the place, whether it's your stock broker or bank, or a government agency, and on and on. Spreadsheet programs like LibreOffice Calc offer direct import of these files, and then as a spreadsheet you've got a zillion tools available to crunch numbers, make graphs, etc. Sometimes, though, you need to load the CSV into a database to do other sorts of work.

What I have to show today is a simple Node.js script that automagically loads a CSV into a database, and you don't even have to run a CREATE TABLE command. This script automatically constructs a CREATE TABLE command that matches the columns in the CSV. The script can be used as a starting point if your specific need is a little different. In my case I just needed the data inside MySQL without too much worry over the column names.

The source code is available on gitlab: https://gitlab.com/robogeek/csv2db/blob/master/csv2mysql.js ... a nearly identical script, for importing into SQLITE3 is at https://gitlab.com/robogeek/csv2db/blob/master/csv2sqlite3.js

'use strict';

const parse      = require('csv-parse');
const util       = require('util');
const fs         = require('fs');
const path       = require('path');
const mysql      = require('mysql');
const async      = require('async');
const csvHeaders = require('csv-headers');
const leftpad    = require('leftpad');

const csvfn = process.argv[2];
const dbnm  = process.argv[3];
const tblnm = process.argv[4];

Here we're getting command line arguments - the filename for the CSV, the database name, and the name of the table to create.

new Promise((resolve, reject) => {
    csvHeaders({
        file      : csvfn,
        delimiter : ','
    }, function(err, headers) {
        if (err) reject(err);
        else resolve({ headers });
    });
})

The script is structured as a Promise chain, allowing us to string together several asynchronous operations. In this first step we use the csv-headers module to read the first line of the CSV. The script relies on there being a header line in the CSV naming each column.

Another thing is the CSV headers are returned as part of an object. In the subsequent steps we'll call that object context and we'll add other data as we go.

.then(context => {
    return new Promise((resolve, reject) => {
        context.db = mysql.createConnection({
            host     : '127.0.0.1',
            user     : 'user-name',
            password : 'pass-word',
            database : dbnm
        });
        context.db.connect((err) => {
            if (err) {
                console.error('error connecting: ' + err.stack);
                reject(err);
            } else {
                resolve(context);
            }
        });
    })
})
.then(context => {
    return new Promise((resolve, reject) => {
        context.db.query(`DROP TABLE IF EXISTS ${tblnm}`,
        [ ],
        err => {
            if (err) reject(err);
            else resolve(context);
        })
    });
})

In this step we connect to the MySQL database. And, we delete the table if it exists. The database connection is stored as context.db

.then(context => {
    return new Promise((resolve, reject) => {
        var fields = '';
        var fieldnms = '';
        var qs = '';
        context.headers.forEach(hdr => {
            hdr = hdr.replace(' ', '_');
            if (fields !== '') fields += ',';
            if (fieldnms !== '') fieldnms += ','
            if (qs !== '') qs += ',';
            fields += ` ${hdr} TEXT`;
            fieldnms += ` ${hdr}`;
            qs += ' ?';
        });
        context.qs = qs;
        context.fieldnms = fieldnms;
        // console.log(`about to create CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`);
        context.db.query(`CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`,
        [ ],
        err => {
            if (err) reject(err);
            else resolve(context);
        })
    });
})

In this step we process the CSV headers and construct a CREATE TABLE command. The field type assigned is TEXT. If a particular column name has spaces in it, the space character is translated to _ so it works as a MySQL column name.

As we go, a couple other arrays are created which are useful in later steps.

.then(context => {
    return new Promise((resolve, reject) => {
        fs.createReadStream(csvfn).pipe(parse({
            delimiter: ',',
            columns: true,
            relax_column_count: true
        }, (err, data) => {
            if (err) return reject(err);
            async.eachSeries(data, (datum, next) => {
                // console.log(`about to run INSERT INTO ${tblnm} ( ${context.fieldnms} ) VALUES ( ${context.qs} )`);
                var d = [];
                try {
                    context.headers.forEach(hdr => {
                        d.push(datum[hdr]);
                    });
                } catch (e) {
                    console.error(e.stack);
                }
                // console.log(`${d.length}: ${util.inspect(d)}`);
                if (d.length > 0) {
                    context.db.query(`INSERT INTO ${tblnm} ( ${context.fieldnms} ) VALUES ( ${context.qs} )`, d,
                    err => {
                        if (err) { console.error(err); next(err); }
                        else setTimeout(() => { next(); });
                    });
                } else { console.log(`empty row ${util.inspect(datum)} ${util.inspect(d)}`); next(); }
            },
            err => {
                if (err) reject(err);
                else resolve(context);
            });
        }));
    });
})
.then(context => { context.db.end(); })
.catch(err => { console.error(err.stack); });

Now we use the csv-parse module to read the CSV and then do MySQL commands to add each row to the database. This will read the entire CSV into memory before getting to the stage of adding rows to the database.

On each row, we construct an Array containing the value for each column, because that's what is required by MySQL's query function. The field names and ?'s required for the INSERT INTO command were constructed in the previous step. In this step we simply have to ensure creating the value array in the same order and number of entries.

What that means is - with INSERT INTO we list field names for which to insert values, and we give a matching list of ?'s as place-holders for the inserted values. Then we need a matching array of values to insert. That's how the MySQL query function works, and plenty of other SQL database drivers work the same way. That means the CSV must have the same number of columns on each row.

« The advent of async/await for Node.js - Node.js v7 has now arrived Fixing "Maximum call stack size exceeded" in async Node.js code »
2016 Election Acer C720 Ad block AkashaCMS Android Apple Apple Hardware History Apple iPhone Hardware April 1st Arduino ARM Compilation Asynchronous Programming Authoritarianism Automated Social Posting Bells Law Big Brother Blade Runner Blogger Blogging Books Botnet Botnets Cassette Tapes Cellphones Christopher Eccleston Chrome Chrome Apps Chromebook Chromebooks Chromebox ChromeOS CIA CitiCards Civil Liberties Clinton Cluster Computing Command Line Tools Computer Hardware Computer Repair Computers Cross Compilation Crouton Cyber Security Cybermen Daleks Darth Vader Data backup Data Storage Database Database Backup Databases David Tenant DDoS Botnet Detect Adblocker Digital Photography DIY DIY Repair Docker Doctor Who Doctor Who Paradox Drobo Drupal Drupal Themes DVD Election Hacks Emdebian Enterprise Node ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Facebook Fake News File transfer without iTunes FireFly Fraud Freedom of Speech Gallifrey git Gitlab GMAIL Google Google Chrome Google Gnome Google+ Government Spying Great Britain Home Automation HTTPS I2C Protocol Image Conversion Image Processing ImageMagick InfluxDB Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS Devices iPad iPhone iPhone hacking Iron Man Iternet of Things iTunes Java JavaScript JavaScript Injection JDBC John Simms Joyent Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac OS Mac OS X Matt Lucas MEADS Anti-Missile Mercurial Michele Gomez Military Hardware Missy Mobile Applications MODBUS Mondas Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole Net Neutrality Node Web Development Node.js Node.js Database Node.js Testing Node.JS Web Development Node.x North Korea Online advertising Online Fraud Open Media Vault Open Source Software OpenAPI OpenVPN Personal Flight Peter Capaldi Photography Plex Media Server Political Protest Power Control Privacy Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero Recycling Republicans Retro-Technology Reviews Right to Repair River Song Rocket Ships RSS News Readers rsync Russia Russia Troll Factory Scheme Science Fiction Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Shell Scripts Silence Simsimi Skype Social Media Warfare Social Networks Software Development Space Flight Space Ship Reuse Space Ships SpaceX Spring SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Swagger Synchronizing Files Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Torchwood Total Information Awareness Trump Trump Administration Ubuntu Virtual Private Networks VirtualBox VLC VOIP Web Applications Web Developer Resources Web Development Web Development Tools Weeping Angels WhatsApp Wordpress