Easily import your CSV data into MySQL with Node.js

By: (plus.google.com) +David Herron; Date: 2016-09-16 10:59

Tags: Node.JS » Node.js Database » Node Web Development » MySQL

It's convenient to import a CSV file into a spreadsheet to crunch numbers. CSV's (comma-seperated-values) and it's brethren (TSV, for tab-separated, etc) are widely used to deliver information. You'll see CSV downloads available all over the place, whether it's your stock broker or bank, or a government agency, and on and on. Spreadsheet programs like LibreOffice Calc offer direct import of these files, and then as a spreadsheet you've got a zillion tools available to crunch numbers, make graphs, etc. Sometimes, though, you need to load the CSV into a database to do other sorts of work.

What I have to show today is a simple Node.js script that automagically loads a CSV into a database, and you don't even have to run a CREATE TABLE command. This script automatically constructs a CREATE TABLE command that matches the columns in the CSV. The script can be used as a starting point if your specific need is a little different. In my case I just needed the data inside MySQL without too much worry over the column names.

The source code is available on gitlab: (gitlab.com) https://gitlab.com/robogeek/csv2db/blob/master/csv2mysql.js ... a nearly identical script, for importing into SQLITE3 is at https://gitlab.com/robogeek/csv2db/blob/master/csv2sqlite3.js

'use strict';

const parse      = require('csv-parse');
const util       = require('util');
const fs         = require('fs');
const path       = require('path');
const mysql      = require('mysql');
const async      = require('async');
const csvHeaders = require('csv-headers');
const leftpad    = require('leftpad');

const csvfn = process.argv[2];
const dbnm  = process.argv[3];
const tblnm = process.argv[4];

Here we're getting command line arguments - the filename for the CSV, the database name, and the name of the table to create.

new Promise((resolve, reject) => {
    csvHeaders({
        file      : csvfn,
        delimiter : ','
    }, function(err, headers) {
        if (err) reject(err);
        else resolve({ headers });
    });
})

The script is structured as a Promise chain, allowing us to string together several asynchronous operations. In this first step we use the csv-headers module to read the first line of the CSV. The script relies on there being a header line in the CSV naming each column.

Another thing is the CSV headers are returned as part of an object. In the subsequent steps we'll call that object context and we'll add other data as we go.

.then(context => {
    return new Promise((resolve, reject) => {
        context.db = mysql.createConnection({
            host     : '127.0.0.1',
            user     : 'user-name',
            password : 'pass-word',
            database : dbnm
        });
        context.db.connect((err) => {
            if (err) {
                console.error('error connecting: ' + err.stack);
                reject(err);
            } else {
                resolve(context);
            }
        });
    })
})
.then(context => {
    return new Promise((resolve, reject) => {
        context.db.query(`DROP TABLE IF EXISTS ${tblnm}`,
        [ ],
        err => {
            if (err) reject(err);
            else resolve(context);
        })
    });
})

In this step we connect to the MySQL database. And, we delete the table if it exists. The database connection is stored as context.db

.then(context => {
    return new Promise((resolve, reject) => {
        var fields = '';
        var fieldnms = '';
        var qs = '';
        context.headers.forEach(hdr => {
            hdr = hdr.replace(' ', '_');
            if (fields !== '') fields += ',';
            if (fieldnms !== '') fieldnms += ','
            if (qs !== '') qs += ',';
            fields += ` ${hdr} TEXT`;
            fieldnms += ` ${hdr}`;
            qs += ' ?';
        });
        context.qs = qs;
        context.fieldnms = fieldnms;
        // console.log(`about to create CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`);
        context.db.query(`CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`,
        [ ],
        err => {
            if (err) reject(err);
            else resolve(context);
        })
    });
})

In this step we process the CSV headers and construct a CREATE TABLE command. The field type assigned is TEXT. If a particular column name has spaces in it, the space character is translated to _ so it works as a MySQL column name.

As we go, a couple other arrays are created which are useful in later steps.

.then(context => {
    return new Promise((resolve, reject) => {
        fs.createReadStream(csvfn).pipe(parse({
            delimiter: ',',
            columns: true,
            relax_column_count: true
        }, (err, data) => {
            if (err) return reject(err);
            async.eachSeries(data, (datum, next) => {
                // console.log(`about to run INSERT INTO ${tblnm} ( ${context.fieldnms} ) VALUES ( ${context.qs} )`);
                var d = [];
                try {
                    context.headers.forEach(hdr => {
                        d.push(datum[hdr]);
                    });
                } catch (e) {
                    console.error(e.stack);
                }
                // console.log(`${d.length}: ${util.inspect(d)}`);
                if (d.length > 0) {
                    context.db.query(`INSERT INTO ${tblnm} ( ${context.fieldnms} ) VALUES ( ${context.qs} )`, d,
                    err => {
                        if (err) { console.error(err); next(err); }
                        else setTimeout(() => { next(); });
                    });
                } else { console.log(`empty row ${util.inspect(datum)} ${util.inspect(d)}`); next(); }
            },
            err => {
                if (err) reject(err);
                else resolve(context);
            });
        }));
    });
})
.then(context => { context.db.end(); })
.catch(err => { console.error(err.stack); });

Now we use the csv-parse module to read the CSV and then do MySQL commands to add each row to the database. This will read the entire CSV into memory before getting to the stage of adding rows to the database.

On each row, we construct an Array containing the value for each column, because that's what is required by MySQL's query function. The field names and ?'s required for the INSERT INTO command were constructed in the previous step. In this step we simply have to ensure creating the value array in the same order and number of entries.

What that means is - with INSERT INTO we list field names for which to insert values, and we give a matching list of ?'s as place-holders for the inserted values. Then we need a matching array of values to insert. That's how the MySQL query function works, and plenty of other SQL database drivers work the same way. That means the CSV must have the same number of columns on each row.

« The advent of async/await for Node.js - Node.js v7 has now arrived Fixing "Maximum call stack size exceeded" in async Node.js code »
2016 Election 2018 Elections Acer C720 Ad block Air Filters Air Quality Air Quality Monitoring AkashaCMS Amazon Amazon Kindle Amazon Web Services America Amiga and Jon Pertwee Android Anti-Fascism AntiVirus Software Apple Apple Hardware History Apple iPhone Apple iPhone Hardware April 1st Arduino ARM Compilation Artificial Intelligence Astronomy Astrophotography Asynchronous Programming Authoritarianism Automated Social Posting AWS DynamoDB AWS Lambda Ayo.JS Bells Law Big Brother Big Data Big Finish Big Science Bitcoin Mining Black Holes Blade Runner Blockchain Blogger Blogging Books Botnets Cassette Tapes Cellphones China China Manufacturing Christopher Eccleston Chrome Chrome Apps Chromebook Chromebox ChromeOS CIA CitiCards Citizen Journalism Civil Liberties Climate Change Clinton Cluster Computing Command Line Tools Comment Systems Computer Accessories Computer Hardware Computer Repair Computers Conservatives Cross Compilation Crouton Cryptocurrency Curiosity Rover Currencies Cyber Security Cybermen Cybersecurity Daleks Darth Vader Data backup Data Formats Data Storage Database Database Backup Databases David Tenant DDoS Botnet Department of Defense Department of Justice Detect Adblocker Developers Editors Digital Nomad Digital Photography Diskless Booting Disqus DIY DIY Repair DNP3 Do it yourself Docker Docker MAMP Docker Swarm Doctor Who Doctor Who Paradox Doctor Who Review Drobo Drupal Drupal Themes DVD E-Books E-Readers Early Computers eGPU Election Hacks Electric Bicycles Electric Vehicles Electron Eliminating Jobs for Human Emdebian Encabulators Energy Efficiency Enterprise Node EPUB ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Face Recognition Facebook Fake News Fedora VirtualBox Fifth Doctor File transfer without iTunes FireFly Flash Flickr Fraud Freedom of Speech Front-end Development G Suite Gallifrey Gig Economy git Github GitKraken Gitlab GMAIL Google Google Chrome Google Gnome Google+ Government Spying Great Britain Green Transportation Hate Speech Heat Loss Hibernate High Technology Hoax Science Home Automation HTTP Security HTTPS Human ID I2C Protocol Image Analysis Image Conversion Image Processing ImageMagick In-memory Computing InfluxDB Infrared Thermometers Insulation Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS iOS Devices iPad iPhone iPhone hacking Iron Man iShowU Audio Capture iTunes Janet Fielding Java JavaFX JavaScript JavaScript Injection JDBC John Simms Journalism Joyent Kaspersky Labs Kext Kindle Kindle Marketplace Large Hadron Collider Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac Mini Mac OS Mac OS X MacBook Pro Machine Learning Machine Readable ID Macintosh macOS macOS High Sierra macOS Kext MacOS X setup Make Money Online Make Money with Gigs March For Our Lives MariaDB Mars Mass Violence Matt Lucas MEADS Anti-Missile Mercurial MERN Stack Michele Gomez Micro Apartments Microsoft Military AI Military Hardware Minification Minimized CSS Minimized HTML Minimized JavaScript Missy Mobile Applications Mobile Computers MODBUS Mondas Monetary System MongoDB Mongoose Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole NASA Net Neutrality Network Attached Storage Node Web Development Node.js Node.js Database Node.js Performance Node.js Testing Node.JS Web Development Node.x North Korea npm NVIDIA NY Times Online advertising Online Community Online Fraud Online Journalism Online Photography Online Video Open Media Vault Open Source Open Source and Patents Open Source Governance Open Source Licenses Open Source Software OpenAPI OpenJDK OpenVPN Palmtop PDA Patrick Troughton PayPal Paywalls Personal Flight Peter Capaldi Peter Davison Phishing Photography PHP Plex Plex Media Server Political Protest Politics Postal Service Power Control President Trump Privacy Private E-mail server Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero ReactJS Recaptcha Recycling Refurbished Computers Remote Desktop Removable Storage Republicans Retro Computing Retro-Technology Reviews RFID Rich Internet Applications Right to Repair River Song Robotics Robots Rocket Ships RSS News Readers rsync Russia Russia Troll Factory Russian Hacking Rust SCADA Scheme Science Fiction SD Cards Search Engine Ranking Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Serverless Framework Servers Shell Scripts Silence Simsimi Skype SmugMug Social Media Social Media Networks Social Media Warfare Social Network Management Social Networks Software Development Software Patents Space Flight Space Ship Reuse Space Ships SpaceX Spear Phishing Spring Spring Boot Spy Satellites SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Stand For Truth Strange Parts Swagger Synchronizing Files Tegan Jovanka Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Tom Baker Torchwood Total Information Awareness Trump Trump Administration Trump Campaign Twitter Ubuntu Udemy UDOO US Department of Defense Video editing Virtual Private Networks VirtualBox VLC VNC VOIP Vue.js Walmart Weapons Systems Web Applications Web Developer Resources Web Development Web Development Tools Web Marketing Webpack Website Advertising Weeping Angels WhatsApp William Hartnell Window Insulation Windows Windows Alternatives Wordpress World Wide Web Yahoo YouTube YouTube Monetization