Simple data export and manipulation using Node.js and the node-mysql module

By: (plus.google.com) +David Herron; Date: 2012-07-10 16:04

Tags: Node.JS » MySQL

What follows is a fairly simple data processing script to extract data from a Drupal installation in a MySQL database. The purpose was to export nodes from a Drupal site, reformatting them for some software I'd written to import content into a blogger blog. In other words, this is the sort of simple data extraction and manipulation tool we write all the time.

The implementation is on the Node.js platform, using the node-mysql module. I'm liking Node.js because JavaScript makes it fairly easy to write simple code.

I suppose that theoretically I should have written this script in PHP and used the Drupal API to access the data. Truthfully, I have a hard time stomaching PHP because it's such a messy language. Dang it, this is the 2010's and we shouldn't still be using languages whose design reeks of the 1980's. Oops, that's surely to cause a flame or two, oh well.

The real problem I ran into is that the node-mysql home page, at (github.com) https://github.com/felixge/node-mysql (link is external), says to install node-mysql this way:

npm install mysql async --save

This also installs async which is useful for simplifying asynchronous coding.

First step, drag in the required modules.

var util  = require('util');
var async = require('async');
var mysql = require('mysql');

Next set up the database connection

var client = mysql.createClient({
  user: 'user-name',
  password: 'user-password',
  host: 'database-host',
  database: 'database-name',
});

This much is pretty straight forward. The client object in the 0.9.x version isn't very Node-like in that it doesn't provide event listener interfaces etc. Those are coming in the 2.0.x version instead.

The task in this case is to gather two sets of data out of the MySQL database: a) node content, b) taxonomy terms for each node.

Because of how it's stored, represented in Drupal, etc, this is done with two SQL queries, and in a third step the two sets of data will be merged together.

The first query retrieves the node data I want:

SELECT n.nid, td.name
  FROM node n, taxonomy_index ti, taxonomy_term_data td
  WHERE n.nid = ti.nid AND ti.tid = td.tid ORDER BY n.nid
The second query retrieves the taxonomy data I want:
SELECT n.nid, n.created, n.title, fdb.body_value
    FROM node n, field_data_body fdb
    WHERE n.nid = fdb.entity_id AND n.status = 1
    ORDER BY n.nid

I don't normally write my SQL this way with keywords capitalized, but thought it would make it easier to read. Anyway, the queries are written for Drupal 7 and assumes you're only retrieving the title and body and that there are no other fields. I should have also put in a test against the node type field, but on this specific Drupal instance there was only one node type in use. In any case it provides two sets of outputs, one being a table showing the node ID and taxonomy term, the other being a table showing the node ID, time stamp, title and body.

The implementation follows and is large enough I don't want to break it into smaller chunks.

The first thing is that I used the async module to simplify what would have been nested callback functions and code that's harder-than-necessary to read. This is the basic idea of the structure:-

async.series([
  function(cb) {
step 1
     In this function, when processing is finished, call cb(null, results)
     If instead an error is found, call cb(err)
  },
  function(cb) {
step 2
     Same
  },
  function(cb) {
step 3
  }]);

The async module uses the callback function to coordinate running one step at a time of a longer process. It doesn't matter how far down into inner callbacks that step goes, when it calls the async callback function that step is done and the next one is begun.

In order to facilitate passing data from step 1 and step 2 to the final step, I used a pair of global variables. The async module does provide a mechanism for massing data from one step to the next, but I didn't feel like using it, and instead did it this way.

In any case you can see that the MySQL query is done using this pattern:

client.query('SQLE',
    function(err, results, fields) {
        process the data
    });

In node-mysql 2.0.x you have a new way of interacting with the query, using asynchronous oriented event handler functions so that for example you can process one row at a time rather than waiting for the complete query to be retrieved.

That's about it -- three steps, two to retrieve data, the third to merge the data into one representation. It should give you an idea or two of how to work with MySQL and node-mysql in Node.js.

var terms = undefined;
var nodes = undefined;

async.series([

function(cb) {
 client.query(
   'select n.nid, td.name from node n, taxonomy_index ti, taxonomy_term_data td where n.nid = ti.nid and ti.tid = td.tid order by n.nid',
   function selectCb(err, results, fields) {
     if (err) {
       throw err;
     }

     terms = results;
     client.end();
     cb(null);
   }
 );
},

function(cb) {
 client.query(
   'select n.nid, n.created, n.title, fdb.body_value from node n,field_data_body fdb where n.nid = fdb.entity_id and n.status = 1 order by n.nid',
   function selectCb(err, results, fields) {
     if (err) {
       throw err;
     }

     nodes = results;
     client.end();
     cb(null);
   }
 );
},

function(cb) {
  for (var key in nodes) {
    var node = nodes[key];
    util.print("\n
    util.print('title+ node.title +"\n");
    util.print('description+ node.body_value +"\n");
    util.print('date+ node.created +"\n");
    for (var key2 in terms) {
      if (terms[key2].nid == node.nid) {
        util.print('tag+ terms[key2].name +"\n");
      }
    }
    util.print("\n
  }
}

]);
« Overriding console.log in Node.js, and other thoughts about logging in Node apps Potential for integrating Node.js with Drupal and speed up Drupal page processing »