Correctly match URL against domain name without killing yourself with regular expressions

By: ( +David Herron; Date: 2017-07-05 21:41

Tags: Node.JS » JavaScript

The Internet relies on domain names as a more user-friendly humane address mechanism than IP addresses. That means we must often write code checking if a URL is "within" domain A or domain B, and to act accordingly. You might think a regular expression is the way to go, but it has failings because while a URL looks like a text string it's actually a data structure. A domain name comparison has to recognize that it's dealing with a data structure, and to compare correctly. Otherwise a URL with domain name "" might match the regular expression /$/i and do the wrong thing.

The task in my hand is scanning website content for links to affiliate partners, make sure the links have rel=nofollow, affiliate tags, and so on. The work is being done for the ( AkashaCMS Affiliate Links plugin which simplifies making affiliate links in an AkashaCMS website.

I had the following loop:

let href = ... the href= attribute of the link to modify
let urlP = url.parse(href, true, true);
    { country: "com", domain: /amazon\.com$/i },
    { country: "ca",  domain: /amazon\.ca$/i },
    { country: "co-jp",  domain: /amazon\.co\.jp$/i },
    { country: "co-uk",  domain: /amazon\.co\.uk$/i },
    { country: "de",  domain: /amazon\.de$/i },
    { country: "es",  domain: /amazon\.es$/i },
    { country: "fr",  domain: /amazon\.fr$/i },
    { country: "it",  domain: /amazon\.it$/i }
].forEach(amazonSite => {
    let amazonCode = getAmazonAffiliateCodeForCountry(;
    if (amazonSite.domain.test(urlP.hostname) && amazonCode) {
        ... operate on the link

The code as it stands "works" to a degree. It knows a set of Amazon domains, and uses the regular expression to match against the hostname portion of the URL.

But as I noted in the introduction, this doesn't match the domain name properly. Yes, I've made sure to use the caseless modifier (i) and to escape the . characters so I'm assuredly correctly matching the domain name. But, did I prevent it from matching a domain of Nope.

What's desired is for the match to work like a domain name match should work. While I'm sure the predominant technique for matching domain names is regular expressions, they aren't a good mechanism for matching domain names.

For example you want to match and and any other subdomain of One would possibly encode a more complete match in a more comprehensive regular expression ... e.g. /^amazon\.com$|.*\.amazon\.com$/i might work, or it might not though an expression like that would work. As you start accounting for more corner cases the regular expression starts to be more and more complex. You're on a slippery slope into regular expression hell, and perhaps it's necessary to take a step back and consider the situation.

Wouldn't a match expression like * make more sense? In other words, doesn't rewriting the above loop as so make more sense?

let href = ... the href= attribute of the link to modify
let urlP = url.parse(href, true, true);
    { country: "com", domain: '*' },
    { country: "ca",  domain: '*' },
    { country: "co-jp",  domain: '*' },
    { country: "co-uk",  domain: '*' },
    { country: "de",  domain: '*' },
    { country: "es",  domain: '*' },
    { country: "fr",  domain: '*' },
    { country: "it",  domain: '*' }
].forEach(amazonSite => {
    let amazonCode = getAmazonAffiliateCodeForCountry(;
    if (domainMatch(amazonSite.domain, href) && amazonCode) {
        ... operate on the link

The question is where to get the domainMatch function.

Try: (

USAGE is as above, or:

var domainMatch = require('domain-match');
var matched = domainMatch('*', '');
// matched == true

In other words, you don't even have to parse the URL, the domainMatch function does it for you. But more importantly, it does domain name matching the way it's supposed to be done. The matching expression in this case is simple and straight-forward and natural to the task of matching domain names.

$ node
> const domainMatch = require('domain-match');
> domainMatch('*', '');
> domainMatch('*', '');

Even more interesting is it matches not just the domain name but the other parts of the URL. In this case changing prefix to prefix2 caused the URL comparison to not match.

A related package

The domain-match package is what came up first in my search on Another package popped up in a broader search:

It's curious why domain-match is so thinly used, and why aren't there more packages of this sort? Or does everyone just use regular expressions or even worse simple string comparison?

« Node.js team adopts the Contributor Code of Conduct, fostering a welcoming environment for contributors In JavaScript (Node.js), how do I read a text file from a different directory and store into a string? »
2016 Election 2018 Elections Acer C720 Ad block Affiliate marketing Air Filters Air Quality Air Quality Monitoring AkashaCMS Amazon Amazon Kindle Amazon Web Services America Amiga and Jon Pertwee Android Anti-Fascism AntiVirus Software Apple Apple Flexgate Apple Hardware History Apple Hardware Mistakes Apple iPhone Apple iPhone Hardware April 1st Arduino ARM Compilation Artificial Intelligence Astronomy Astrophotography Asynchronous Programming Authoritarianism Automated Social Posting AWS DynamoDB AWS Lambda Ayo.JS Bells Law Big Brother Big Data Big Finish Big Science Bitcoin Mining Black Holes Blade Runner Blockchain Blogger Blogging Books Botnets Cassette Tapes Cellphones China China Manufacturing Christopher Eccleston Chrome Chrome Apps Chromebook Chromebox ChromeOS CIA CitiCards Citizen Journalism Civil Liberties Climate Change Clinton Cluster Computing Command Line Tools Comment Systems Computer Accessories Computer Hardware Computer Repair Computers Conservatives Cross Compilation Crouton Cryptocurrency Curiosity Rover Currencies Cyber Security Cybermen Cybersecurity Daleks Darth Vader Data backup Data Formats Data Storage Database Database Backup Databases David Tenant DDoS Botnet Department of Defense Department of Justice Detect Adblocker Developers Editors Digital audio Digital Nomad Digital Photography Direct Attach Storage Diskless Booting Disqus DIY DIY Repair DNP3 Do it yourself Docker Docker MAMP Docker Swarm Doctor Who Doctor Who Paradox Doctor Who Review Drobo Drupal Drupal Themes DuckDuckGo DVD E-Books E-Readers Early Computers eGPU Election Hacks Electric Bicycles Electric Vehicles Electron Eliminating Jobs for Human Emdebian Encabulators Energy Efficiency Enterprise Node EPUB ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Face Recognition Facebook Fake Advertising Fake News Fedora VirtualBox Fifth Doctor File transfer without iTunes FireFly Flash Flickr Fraud Freedom of Speech Front-end Development G Suite Gallifrey Gig Economy git Github GitKraken Gitlab GMAIL Google Google Adsense Google Chrome Google Gnome Google+ Government Spying Great Britain Green Transportation Hate Speech Heat Loss Hibernate High Technology Hoax Science Home Automation HTTP Security HTTPS Human ID I2C Protocol Image Analysis Image Conversion Image Processing ImageMagick In-memory Computing Incognito Mode InfluxDB Infrared Thermometers Insulation Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS iOS Devices iPad iPhone iPhone hacking Iron Man iShowU Audio Capture iTunes Janet Fielding Java JavaFX JavaScript JavaScript Injection JDBC John Simms Journalism Joyent jQuery Kaspersky Labs Kext Kindle Kindle Marketplace Large Hadron Collider Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac Mini Mac OS Mac OS X Mac Pro MacBook Pro Machine Learning Machine Readable ID Macintosh macOS macOS High Sierra macOS Kext MacOS X setup Make Money Online Make Money with Gigs March For Our Lives MariaDB Mars Mass Violence Matt Lucas MEADS Anti-Missile Mercurial MERN Stack Michele Gomez Micro Apartments Microsoft Military AI Military Hardware Minification Minimized CSS Minimized HTML Minimized JavaScript Missy Mobile Applications Mobile Computers MODBUS Mondas Monetary System MongoDB Mongoose Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole NASA Net Neutrality Network Attached Storage Node Web Development Node.js Node.js Database Node.js Performance Node.js Testing Node.JS Web Development Node.x North Korea npm NSA NVIDIA NY Times Online advertising Online Community Online Fraud Online Journalism Online News Online Photography Online Video Open Media Vault Open Source Open Source and Patents Open Source Governance Open Source Licenses Open Source Software OpenAPI OpenJDK OpenVPN Palmtop PDA Patrick Troughton PayPal Paywalls Personal Flight Peter Capaldi Peter Davison Phishing Photography PHP Plex Plex Media Server Political Protest Politics Postal Service Power Control President Trump Privacy Private E-mail server Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero ReactJS Recaptcha Recycling Refurbished Computers Remote Desktop Removable Storage Renewable Energy Republicans Retro Computing Retro-Technology Reviews RFID Rich Internet Applications Right to Repair River Song Robotics Robots Rocket Ships RSS News Readers rsync Russia Russia Troll Factory Russian Hacking Rust SCADA Scheme Science Fiction SD Cards Search Engine Ranking Search Engines Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Serverless Framework Servers Shell Scripts Silence Simsimi Skype SmugMug Social Media Social Media Networks Social Media Warfare Social Network Management Social Networks Software Development Software Patents Space Flight Space Ship Reuse Space Ships SpaceX Spear Phishing Spring Spring Boot Spy Satellites SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Stand For Truth Strange Parts Swagger Synchronizing Files Tegan Jovanka Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Tom Baker Torchwood Total Information Awareness Trump Trump Administration Trump Campaign Twitter Ubuntu Udemy UDOO US Department of Defense Video editing Virtual Private Networks VirtualBox VLC VNC VOIP Vue.js Walmart Weapons Systems Web Applications Web Developer Resources Web Development Web Development Tools Web Marketing Webpack Website Advertising Website Business Models Weeping Angels WhatsApp William Hartnell Window Insulation Windows Windows Alternatives Wordpress World Wide Web Yahoo YouTube YouTube Adpocalypse YouTube Monetization