You can joyfully parse and manipulate URL's in browser-based JavaScript

; Date: Wed Aug 20 2014

Tags: JavaScript

URL's are not strings, but are a data structure that's represented as a string. How do you easily and reliably manipulate a URL string programmatically? Do you use regular expressions or other kinds of string manipulations? Given all the ways to encode data in a URL, how do you ensure it remains syntactically correct while doing string manipulation? Manipulating URL's with regular expressions is rather difficult because of the format and nature of a URL. It's better to manipulate a URL as if it's a data structure, to let software easily change URL fields while ensuring the URL is syntactically correct.

While many languages have built-in classes that assist working with a URL as a data structure, some do not. For example, JavaScript. In Java the java.net.URL class has excellent capabilities to parse, manipulate and render URL's, meaning that a program can take a URL and work on it as data rather than a string. JavaScript has no such built-in class.

What if you want to remove a query parameter from the URL accommodating the various ways the parameter can be encoded? Or change the query parameter value? If you're limited just to string functions, those tasks are more complex than it might seem.

Does that mean a JavaScript programmer can only manipulate a URL using string operations? No.

For a Node.js programmer, that platform has a url module which provides very nice URL manipulation functions. But what about JavaScript in a web browser? The JavaScript language doesn't have a URL class, so what can you do in a browser?

var cururl = window.location.href;
// parse the URL using this cool trick
var parser = document.createElement('a');
parser.href = cururl;
// var a = $('<a>', { href: cururl })[0]; -- alternate way w/ jQuery

What's happening here? First, window.location.href is the definitive way to get the current URL shown in the browser location box, so that's what we need to start with. Second, we're relying on a browser object, Location, that's created as a byproduct of an a element.

We create an a element, and then assign the URL string we retrieved into the href field of that element. That causes the a element to produce what's called a Location object which has the sort of useful URL fields and methods we want to treat this URL as a data structure. Note that this a element is not being displayed anywhere on the browser window. It's an off-screen object we're using solely for its ability to manipulate URL's. See the (developer.mozilla.org) documentation on developer.mozilla.org for details.

In an example I've developed - (webmaster-tips.davidherron.com) http://webmaster-tips.davidherron.com/javascript-ui/togglr/test1.html - I wanted to manipulate the query parameters, adding or removing parameters depending on the user clicking on buttons. While the Location object has a search field, it's a simple string rather than an array. While the Location object gives us a nice data structure, fields and methods, it falls down when it comes to manipulating the search string because we're limited to string functions.

Here's what I ended up doing

var srch = [];
if (parser.search.length > 0) {
    var srch = parser.search.substring(1).split('&');
}

Query strings are punctuated by ampersands, so this code gives us an array of the parameters in name=value format. The substring clause is because the search string has a ? as the very first character, so we need to skip past that character.

The result is we have the search string converted into a data structure (an array). Doing deeper work on individual query parameters would be done by splitting each of these strings on the first = character.

The general idea we'll follow next is to create a second array to hold a new array of query parameters. We can then render that array as a string, and put it back into the Location object, which then gives us the new URL string. Let's see how to do all that.

var nsrch = [];
// Then remove the current sorting params
for (var j = 0; j < srch.length; j++) {
    var val = srch[j];
    if (!(   val.indexOf('sort_by') === 0
          || val.indexOf('sort_order') === 0
          || val.indexOf('field_isphev_value_many_to_one') === 0)
        ) {
        nsrch.push(val);
    }
}

In this code snippet we can create a second array from the first array, and in this case not copying some of the query parameters. In other words, the nsrch array will have a subset of the original query parameters.

if (sortorder) {
    nsrch.push('sort_by='+ sortby);
    nsrch.push('sort_order='+ sortorder);
}

Then if you want to add parameters, you can do it like this just by pushing the parameter string onto the array.

 if (nsrch.length > 0) {
    parser.search = '?'+ nsrch.join('&');
}

Then you can generate a new search string simply by joining the nsrch array like so.

By magic the parser.href field will automatically reflect the new search string. There are a couple corner cases where the URL can end up with ?null or ? tacked onto the end. The demo page linked above has additional code to deal with those cases.

About the Author(s)

(davidherron.com) David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.