Simple data export and manipulation using Node.js and the node-mysql module

By: (plus.google.com) +David Herron; Date: 2012-07-10 16:04

Tags: Node.JS » MySQL

What follows is a fairly simple data processing script to extract data from a Drupal installation in a MySQL database. The purpose was to export nodes from a Drupal site, reformatting them for some software I'd written to import content into a blogger blog. In other words, this is the sort of simple data extraction and manipulation tool we write all the time.

The implementation is on the Node.js platform, using the node-mysql module. I'm liking Node.js because JavaScript makes it fairly easy to write simple code.

I suppose that theoretically I should have written this script in PHP and used the Drupal API to access the data. Truthfully, I have a hard time stomaching PHP because it's such a messy language. Dang it, this is the 2010's and we shouldn't still be using languages whose design reeks of the 1980's. Oops, that's surely to cause a flame or two, oh well.

The real problem I ran into is that the node-mysql home page, at (github.com) https://github.com/felixge/node-mysql (link is external), says to install node-mysql this way:

npm install mysql async --save

This also installs async which is useful for simplifying asynchronous coding.

First step, drag in the required modules.

var util  = require('util');
var async = require('async');
var mysql = require('mysql');

Next set up the database connection

var client = mysql.createClient({
  user: 'user-name',
  password: 'user-password',
  host: 'database-host',
  database: 'database-name',
});

This much is pretty straight forward. The client object in the 0.9.x version isn't very Node-like in that it doesn't provide event listener interfaces etc. Those are coming in the 2.0.x version instead.

The task in this case is to gather two sets of data out of the MySQL database: a) node content, b) taxonomy terms for each node.

Because of how it's stored, represented in Drupal, etc, this is done with two SQL queries, and in a third step the two sets of data will be merged together.

The first query retrieves the node data I want:

SELECT n.nid, td.name
  FROM node n, taxonomy_index ti, taxonomy_term_data td
  WHERE n.nid = ti.nid AND ti.tid = td.tid ORDER BY n.nid
The second query retrieves the taxonomy data I want:
SELECT n.nid, n.created, n.title, fdb.body_value
    FROM node n, field_data_body fdb
    WHERE n.nid = fdb.entity_id AND n.status = 1
    ORDER BY n.nid

I don't normally write my SQL this way with keywords capitalized, but thought it would make it easier to read. Anyway, the queries are written for Drupal 7 and assumes you're only retrieving the title and body and that there are no other fields. I should have also put in a test against the node type field, but on this specific Drupal instance there was only one node type in use. In any case it provides two sets of outputs, one being a table showing the node ID and taxonomy term, the other being a table showing the node ID, time stamp, title and body.

The implementation follows and is large enough I don't want to break it into smaller chunks.

The first thing is that I used the async module to simplify what would have been nested callback functions and code that's harder-than-necessary to read. This is the basic idea of the structure:-

async.series([
  function(cb) {
step 1
     In this function, when processing is finished, call cb(null, results)
     If instead an error is found, call cb(err)
  },
  function(cb) {
step 2
     Same
  },
  function(cb) {
step 3
  }]);

The async module uses the callback function to coordinate running one step at a time of a longer process. It doesn't matter how far down into inner callbacks that step goes, when it calls the async callback function that step is done and the next one is begun.

In order to facilitate passing data from step 1 and step 2 to the final step, I used a pair of global variables. The async module does provide a mechanism for massing data from one step to the next, but I didn't feel like using it, and instead did it this way.

In any case you can see that the MySQL query is done using this pattern:

client.query('SQLE',
    function(err, results, fields) {
        process the data
    });

In node-mysql 2.0.x you have a new way of interacting with the query, using asynchronous oriented event handler functions so that for example you can process one row at a time rather than waiting for the complete query to be retrieved.

That's about it -- three steps, two to retrieve data, the third to merge the data into one representation. It should give you an idea or two of how to work with MySQL and node-mysql in Node.js.

var terms = undefined;
var nodes = undefined;

async.series([

function(cb) {
 client.query(
   'select n.nid, td.name from node n, taxonomy_index ti, taxonomy_term_data td where n.nid = ti.nid and ti.tid = td.tid order by n.nid',
   function selectCb(err, results, fields) {
     if (err) {
       throw err;
     }

     terms = results;
     client.end();
     cb(null);
   }
 );
},

function(cb) {
 client.query(
   'select n.nid, n.created, n.title, fdb.body_value from node n,field_data_body fdb where n.nid = fdb.entity_id and n.status = 1 order by n.nid',
   function selectCb(err, results, fields) {
     if (err) {
       throw err;
     }

     nodes = results;
     client.end();
     cb(null);
   }
 );
},

function(cb) {
  for (var key in nodes) {
    var node = nodes[key];
    util.print("\n
    util.print('title+ node.title +"\n");
    util.print('description+ node.body_value +"\n");
    util.print('date+ node.created +"\n");
    for (var key2 in terms) {
      if (terms[key2].nid == node.nid) {
        util.print('tag+ terms[key2].name +"\n");
      }
    }
    util.print("\n
  }
}

]);
« Overriding console.log in Node.js, and other thoughts about logging in Node apps Potential for integrating Node.js with Drupal and speed up Drupal page processing »
2016 Election Acer C720 Ad block AkashaCMS Amiga Android Anti-Fascism Apple Apple Hardware History Apple iPhone Apple iPhone Hardware April 1st Arduino ARM Compilation Astronomy Asynchronous Programming Authoritarianism Automated Social Posting Bells Law Big Brother Big Finish Black Holes Blade Runner Blogger Blogging Books Botnet Botnets Cassette Tapes Cellphones Christopher Eccleston Chrome Chrome Apps Chromebook Chromebooks Chromebox ChromeOS CIA CitiCards Citizen Journalism Civil Liberties Clinton Cluster Computing Command Line Tools Computer Hardware Computer Repair Computers Cross Compilation Crouton Curiosity Rover Cyber Security Cybermen Daleks Darth Vader Data backup Data Storage Database Database Backup Databases David Tenant DDoS Botnet Detect Adblocker Developers Editors Digital Photography DIY DIY Repair DNP3 Docker Doctor Who Doctor Who Paradox Drobo Drupal Drupal Themes DVD Early Computers Election Hacks Electric Bicycles Electric Vehicles Electron Emdebian Energy Efficiency Enterprise Node ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Facebook Fake News Fedora VirtualBox File transfer without iTunes FireFly Fraud Freedom of Speech Gallifrey git Gitlab GMAIL Google Google Chrome Google Gnome Google+ Government Spying Great Britain Heat Loss Home Automation HTTPS I2C Protocol Image Analysis Image Conversion Image Processing ImageMagick InfluxDB Infrared Thermometers Insulation Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS Devices iPad iPhone iPhone hacking Iron Man Iternet of Things iTunes Java JavaScript JavaScript Injection JDBC John Simms Journalism Joyent Kindle Marketplace Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac OS Mac OS X MacOS X setup Make Money Online MariaDB Mars Matt Lucas MEADS Anti-Missile Mercurial Michele Gomez Military Hardware Minification Minimized CSS Minimized HTML Minimized JavaScript Missy Mobile Applications MODBUS Mondas MongoDB Mongoose Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole NASA Net Neutrality Node Web Development Node.js Node.js Database Node.js Testing Node.JS Web Development Node.x North Korea Online advertising Online Fraud Online Journalism Online Video Open Media Vault Open Source Governance Open Source Licenses Open Source Software OpenAPI OpenVPN Personal Flight Peter Capaldi Photography PHP Plex Media Server Political Protest Postal Service Power Control Privacy Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero Recycling Remote Desktop Republicans Retro-Technology Reviews Right to Repair River Song Rocket Ships RSS News Readers rsync Russia Russia Troll Factory SCADA Scheme Science Fiction Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Shell Scripts Silence Simsimi Skype Social Media Warfare Social Networks Software Development Space Flight Space Ship Reuse Space Ships SpaceX Spear Phishing Spring Spring Boot SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Swagger Synchronizing Files Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Torchwood Total Information Awareness Trump Trump Administration Ubuntu UDOO Virtual Private Networks VirtualBox VLC VNC VOIP Web Applications Web Developer Resources Web Development Web Development Tools Web Marketing Website Advertising Weeping Angels WhatsApp Window Insulation Wordpress YouTube