Memory-efficient CSV transformation in Node.js

By: ( +David Herron; Date: 2016-11-14 17:06

Tags: Node.JS

Those of us who consume/edit/modify/publish CSV files must from time to time transform a CSV file. Maybe you need to delete columns, rearrange columns, add columns, rename volumes, or compute some values taking one CSV file and producing another. In my case, I have a raw CSV file with no column headers that's organized in a way which makes sense for one team in our company, but we need that same data organized a different way, with different column names and containing selected fields. The following is what came from that need, which I managed to write in a fairly generic way. It not only extracts and renames columns, but with a bit of coding could perform other transformations.

As such this script performs a map operation, meaning it takes an input CSV and produces an output CSV with the same number of rows. The row contents are of course different, but the count of datums in the CSV is the same for input and output. With this script it would be difficult to perform a reduce or filter operation, because both decrease the number of rows, which would be difficult with this script as it is written.

The script relies on the CSV Suite for Node.js: (

 * This script demonstrates a simple CSV transformation that's
 * formulated to use minimal memory.  The processing is done via
 * piping using the Node.js Streams interface.
 * This transformation is to extract selected columns from the
 * input file, then write to another file using different column names.
 * The `transform` section could make other changes such as adding
 * columns together.
'use strict';

const parse     = require('csv-parse');
const stringify = require('csv-stringify');
const transform = require('stream-transform');
const fs        = require('fs-extra-promise');

const infname   = process.argv[2];
const outfname  = process.argv[3];

const inputFields = [
    // List field names for input file

const extractFields = [
    // List field names to extract from input

const outputFields = [
    // List field names in the output file

    delimiter: ',',
    // Use columns: true if the input has column headers
    // Otherwise list the input field names in the array above.
    columns: inputFields
.pipe(transform(function(data) {
    // This sample transformation selects out fields
    // that will make it through to the output.  Simply
    // list the field names in the array above.
    return extractFields
    .map(nm => { return data[nm]; });
    delimiter: ',',
    relax_column_count: true,
    skip_empty_lines: true,
    header: true,
    // This names the resulting columns for the output file.
    columns: outputFields

The input file name and output file name are given on the command line. It's a good idea if the input file has CSV headers, but as written the script does not require column headers. What we mean by that is a feature not used in all CSV files. In some cases the first row of a CSV file gives a name for each column. Such a file is more useful since documentation of the fields are in the file. But obviously not everyone does this, and perhaps some software would choke on the column names.

In this script, if your input file has column names then name a change in the first stage:

    delimiter: ',',
    columns: true

Otherwise, list the column names in the inputFields array.

The second stage is the transformation. The algorithm shown here simply extracts the fields named in the extractFields array. You can rename columns, reorder columns, and eliminate columns this way.

Other transformations can be performed. This function will be called once per row, and the return value from the function constitutes the new value for the row. Hence, the transformation cannot add nor delete rows, meaning the transformed file has the same number of rows on output as for input.

The last stage outputs the CSV using the column names you specify in outputFields.

Since the process uses pipes it is extremely memory efficient. In an earlier version of this script I used a variant of the CSV parser which read the entire CSV into an array before processing could occur. For a large CSV file the Node.js process ran out of memory, and I had to learn how to adjust the Node.js heap size. With pipes the memory footprint at any one time is minimal.

« Useful reading to understand the Promises, Generators and the async/await feature for Node.js/JavaScript The advent of async/await for Node.js - Node.js v7 has now arrived »
2016 Election Acer C720 Ad block AkashaCMS Amazon Amazon Kindle Amazon Web Services America Amiga Android Anti-Fascism AntiVirus Software Apple Apple Hardware History Apple iPhone Apple iPhone Hardware April 1st Arduino ARM Compilation Artificial Intelligence Astronomy Asynchronous Programming Authoritarianism Automated Social Posting AWS DynamoDB AWS Lambda Ayo.JS Bells Law Big Brother Big Finish Bitcoin Mining Black Holes Blade Runner Blockchain Blogger Blogging Books Botnet Botnets Cassette Tapes Cellphones China China Manufacturing Christopher Eccleston Chrome Chrome Apps Chromebook Chromebooks Chromebox ChromeOS CIA CitiCards Citizen Journalism Civil Liberties Clinton Cluster Computing Command Line Tools Comment Systems Computer Accessories Computer Hardware Computer Repair Computers Cross Compilation Crouton Cryptocurrency Curiosity Rover Currencies Cyber Security Cybermen Daleks Darth Vader Data backup Data Storage Database Database Backup Databases David Tenant DDoS Botnet Detect Adblocker Developers Editors Digital Photography Diskless Booting Disqus DIY DIY Repair DNP3 Do it yourself Docker Docker MAMP Docker Swarm Doctor Who Doctor Who Paradox Drobo Drupal Drupal Themes DVD E-Books E-Readers Early Computers Election Hacks Electric Bicycles Electric Vehicles Electron Emdebian Encabulators Energy Efficiency Enterprise Node EPUB ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Facebook Fake News Fedora VirtualBox File transfer without iTunes FireFly Flickr Fraud Freedom of Speech Gallifrey git Github GitKraken Gitlab GMAIL Google Google Chrome Google Gnome Google+ Government Spying Great Britain Heat Loss Hibernate Hoax Science Home Automation HTTP Security HTTPS Human ID I2C Protocol Image Analysis Image Conversion Image Processing ImageMagick In-memory Computing InfluxDB Infrared Thermometers Insulation Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS Devices IoT iPad iPhone iPhone hacking Iron Man Iternet of Things iTunes Java JavaScript JavaScript Injection JDBC John Simms Journalism Joyent Kaspersky Labs Kindle Kindle Marketplace Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac Mini Mac OS Mac OS X Machine Learning Machine Readable ID macOS MacOS X setup Make Money Online March For Our Lives MariaDB Mars Matt Lucas MEADS Anti-Missile Mercurial Michele Gomez Micro Apartments Microsoft Military Hardware Minification Minimized CSS Minimized HTML Minimized JavaScript Missy Mobile Applications MODBUS Mondas Monetary System MongoDB Mongoose Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole NASA Net Neutrality Node Web Development Node.js Node.js Database Node.js Testing Node.JS Web Development Node.x North Korea npm NVIDIA NY Times Online advertising Online Community Online Fraud Online Journalism Online Photography Online Video Open Media Vault Open Source Open Source Governance Open Source Licenses Open Source Software OpenAPI OpenVPN Paywalls Personal Flight Peter Capaldi Photography PHP Plex Plex Media Server Political Protest Postal Service Power Control Privacy Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero Recaptcha Recycling Refurbished Computers Remote Desktop Republicans Retro Computing Retro-Technology Reviews Right to Repair River Song Robotics Rocket Ships RSS News Readers rsync Russia Russia Troll Factory Russian Hacking Rust SCADA Scheme Science Fiction Search Engine Ranking Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Serverless Framework Servers Shell Scripts Silence Simsimi Skype SmugMug Social Media Social Media Warfare Social Networks Software Development Space Flight Space Ship Reuse Space Ships SpaceX Spear Phishing Spring Spring Boot Spy Satellites SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Stand For Truth Strange Parts Swagger Synchronizing Files Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Torchwood Total Information Awareness Trump Trump Administration Trump Campaign Ubuntu Udemy UDOO Virtual Private Networks VirtualBox VLC VNC VOIP Web Applications Web Developer Resources Web Development Web Development Tools Web Marketing Website Advertising Weeping Angels WhatsApp Window Insulation Windows Windows Alternatives Wordpress World Wide Web Yahoo YouTube YouTube Monetization