Simple data export and manipulation using Node.js and the node-mysql module

By: (plus.google.com) +David Herron; Date: 2012-07-10 16:04

Tags: Node.JS » MySQL

What follows is a fairly simple data processing script to extract data from a Drupal installation in a MySQL database. The purpose was to export nodes from a Drupal site, reformatting them for some software I'd written to import content into a blogger blog. In other words, this is the sort of simple data extraction and manipulation tool we write all the time.

The implementation is on the Node.js platform, using the node-mysql module. I'm liking Node.js because JavaScript makes it fairly easy to write simple code.

I suppose that theoretically I should have written this script in PHP and used the Drupal API to access the data. Truthfully, I have a hard time stomaching PHP because it's such a messy language. Dang it, this is the 2010's and we shouldn't still be using languages whose design reeks of the 1980's. Oops, that's surely to cause a flame or two, oh well.

The real problem I ran into is that the node-mysql home page, at (github.com) https://github.com/felixge/node-mysql (link is external), says to install node-mysql this way:

npm install mysql async --save

This also installs async which is useful for simplifying asynchronous coding.

First step, drag in the required modules.

var util  = require('util');
var async = require('async');
var mysql = require('mysql');

Next set up the database connection

var client = mysql.createClient({
  user: 'user-name',
  password: 'user-password',
  host: 'database-host',
  database: 'database-name',
});

This much is pretty straight forward. The client object in the 0.9.x version isn't very Node-like in that it doesn't provide event listener interfaces etc. Those are coming in the 2.0.x version instead.

The task in this case is to gather two sets of data out of the MySQL database: a) node content, b) taxonomy terms for each node.

Because of how it's stored, represented in Drupal, etc, this is done with two SQL queries, and in a third step the two sets of data will be merged together.

The first query retrieves the node data I want:

SELECT n.nid, td.name
  FROM node n, taxonomy_index ti, taxonomy_term_data td
  WHERE n.nid = ti.nid AND ti.tid = td.tid ORDER BY n.nid
The second query retrieves the taxonomy data I want:
SELECT n.nid, n.created, n.title, fdb.body_value
    FROM node n, field_data_body fdb
    WHERE n.nid = fdb.entity_id AND n.status = 1
    ORDER BY n.nid

I don't normally write my SQL this way with keywords capitalized, but thought it would make it easier to read. Anyway, the queries are written for Drupal 7 and assumes you're only retrieving the title and body and that there are no other fields. I should have also put in a test against the node type field, but on this specific Drupal instance there was only one node type in use. In any case it provides two sets of outputs, one being a table showing the node ID and taxonomy term, the other being a table showing the node ID, time stamp, title and body.

The implementation follows and is large enough I don't want to break it into smaller chunks.

The first thing is that I used the async module to simplify what would have been nested callback functions and code that's harder-than-necessary to read. This is the basic idea of the structure:-

async.series([
  function(cb) {
step 1
     In this function, when processing is finished, call cb(null, results)
     If instead an error is found, call cb(err)
  },
  function(cb) {
step 2
     Same
  },
  function(cb) {
step 3
  }]);

The async module uses the callback function to coordinate running one step at a time of a longer process. It doesn't matter how far down into inner callbacks that step goes, when it calls the async callback function that step is done and the next one is begun.

In order to facilitate passing data from step 1 and step 2 to the final step, I used a pair of global variables. The async module does provide a mechanism for massing data from one step to the next, but I didn't feel like using it, and instead did it this way.

In any case you can see that the MySQL query is done using this pattern:

client.query('SQLE',
    function(err, results, fields) {
        process the data
    });

In node-mysql 2.0.x you have a new way of interacting with the query, using asynchronous oriented event handler functions so that for example you can process one row at a time rather than waiting for the complete query to be retrieved.

That's about it -- three steps, two to retrieve data, the third to merge the data into one representation. It should give you an idea or two of how to work with MySQL and node-mysql in Node.js.

var terms = undefined;
var nodes = undefined;

async.series([

function(cb) {
 client.query(
   'select n.nid, td.name from node n, taxonomy_index ti, taxonomy_term_data td where n.nid = ti.nid and ti.tid = td.tid order by n.nid',
   function selectCb(err, results, fields) {
     if (err) {
       throw err;
     }

     terms = results;
     client.end();
     cb(null);
   }
 );
},

function(cb) {
 client.query(
   'select n.nid, n.created, n.title, fdb.body_value from node n,field_data_body fdb where n.nid = fdb.entity_id and n.status = 1 order by n.nid',
   function selectCb(err, results, fields) {
     if (err) {
       throw err;
     }

     nodes = results;
     client.end();
     cb(null);
   }
 );
},

function(cb) {
  for (var key in nodes) {
    var node = nodes[key];
    util.print("\n
    util.print('title+ node.title +"\n");
    util.print('description+ node.body_value +"\n");
    util.print('date+ node.created +"\n");
    for (var key2 in terms) {
      if (terms[key2].nid == node.nid) {
        util.print('tag+ terms[key2].name +"\n");
      }
    }
    util.print("\n
  }
}

]);
« Overriding console.log in Node.js, and other thoughts about logging in Node apps Potential for integrating Node.js with Drupal and speed up Drupal page processing »
2016 Election Acer C720 Ad block AkashaCMS Amazon Amazon Kindle Amazon Web Services America Amiga and Jon Pertwee Android Anti-Fascism AntiVirus Software Apple Apple Hardware History Apple iPhone Apple iPhone Hardware April 1st Arduino ARM Compilation Artificial Intelligence Astronomy Astrophotography Asynchronous Programming Authoritarianism Automated Social Posting AWS DynamoDB AWS Lambda Ayo.JS Bells Law Big Brother Big Finish Bitcoin Mining Black Holes Blade Runner Blockchain Blogger Blogging Books Botnets Cassette Tapes Cellphones China China Manufacturing Christopher Eccleston Chrome Chrome Apps Chromebook Chromebox ChromeOS CIA CitiCards Citizen Journalism Civil Liberties Clinton Cluster Computing Command Line Tools Comment Systems Computer Accessories Computer Hardware Computer Repair Computers Cross Compilation Crouton Cryptocurrency Curiosity Rover Currencies Cyber Security Cybermen Daleks Darth Vader Data backup Data Storage Database Database Backup Databases David Tenant DDoS Botnet Detect Adblocker Developers Editors Digital Photography Diskless Booting Disqus DIY DIY Repair DNP3 Do it yourself Docker Docker MAMP Docker Swarm Doctor Who Doctor Who Paradox Doctor Who Review Drobo Drupal Drupal Themes DVD E-Books E-Readers Early Computers Election Hacks Electric Bicycles Electric Vehicles Electron Emdebian Encabulators Energy Efficiency Enterprise Node EPUB ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Face Recognition Facebook Fake News Fedora VirtualBox File transfer without iTunes FireFly Flickr Fraud Freedom of Speech Front-end Development Gallifrey git Github GitKraken Gitlab GMAIL Google Google Chrome Google Gnome Google+ Government Spying Great Britain Heat Loss Hibernate Hoax Science Home Automation HTTP Security HTTPS Human ID I2C Protocol Image Analysis Image Conversion Image Processing ImageMagick In-memory Computing InfluxDB Infrared Thermometers Insulation Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS Devices iPad iPhone iPhone hacking Iron Man iTunes Java JavaScript JavaScript Injection JDBC John Simms Journalism Joyent Kaspersky Labs Kindle Kindle Marketplace Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac Mini Mac OS Mac OS X Machine Learning Machine Readable ID macOS MacOS X setup Make Money Online March For Our Lives MariaDB Mars Mass Violence Matt Lucas MEADS Anti-Missile Mercurial MERN Stack Michele Gomez Micro Apartments Microsoft Military AI Military Hardware Minification Minimized CSS Minimized HTML Minimized JavaScript Missy Mobile Applications Mobile Computers MODBUS Mondas Monetary System MongoDB Mongoose Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole NASA Net Neutrality Network Attached Storage Node Web Development Node.js Node.js Database Node.js Testing Node.JS Web Development Node.x North Korea npm NVIDIA NY Times Online advertising Online Community Online Fraud Online Journalism Online Photography Online Video Open Media Vault Open Source Open Source Governance Open Source Licenses Open Source Software OpenAPI OpenVPN Palmtop PDA Patrick Troughton Paywalls Personal Flight Peter Capaldi Phishing Photography PHP Plex Plex Media Server Political Protest Postal Service Power Control Privacy Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero ReactJS Recaptcha Recycling Refurbished Computers Remote Desktop Removable Storage Republicans Retro Computing Retro-Technology Reviews RFID Right to Repair River Song Robotics Rocket Ships RSS News Readers rsync Russia Russia Troll Factory Russian Hacking Rust SCADA Scheme Science Fiction SD Cards Search Engine Ranking Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Serverless Framework Servers Shell Scripts Silence Simsimi Skype SmugMug Social Media Social Media Warfare Social Network Management Social Networks Software Development Space Flight Space Ship Reuse Space Ships SpaceX Spear Phishing Spring Spring Boot Spy Satellites SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Stand For Truth Strange Parts Swagger Synchronizing Files Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Tom Baker Torchwood Total Information Awareness Trump Trump Administration Trump Campaign Twitter Ubuntu Udemy UDOO US Department of Defense Virtual Private Networks VirtualBox VLC VNC VOIP Vue.js Web Applications Web Developer Resources Web Development Web Development Tools Web Marketing Webpack Website Advertising Weeping Angels WhatsApp William Hartnell Window Insulation Windows Windows Alternatives Wordpress World Wide Web Yahoo YouTube YouTube Monetization