IndexNow, the new way to inform search engines of new/changed content

» TechSparx » Website Marketing, Make Money on the Web » Search Engine Optimization » IndexNow, the new way to inform search engines of new/changed content

David Herron

; Date: Mon Jun 20 2022

Tags: Search Engine Optimization »»»»

IndexNow lets us tell search engines of pages which were added, modified, or deleted. Search engines promise to quickly respond to IndexNow submissions.

Ideally search engines automatically scan our website and correctly understand the content. In the real world, we are expected to help the search engines. That requires guiding search engines to scan (or crawl) certain pages, and avoid other pages. For years the best practice was to publish a Sitemap file on our website. But, there's a new tool available, IndexNow, which promises fast updates to search engine results pages, and is roughly similar to the old ping protocol.

In the beginning there was the Ping Protocol, which I remember using, but don't remember the specifics. I have a vague memory of using wget or the like to make a GET against a URL on the search engine, giving it my website URL, telling the search engine about new content. But, spammers abused the ping protocol, and eventually it was dropped by search engines.

Later, RSS feeds were developed which are data files listing recent content, that search engines could discover and query. Another data file, the Sitemap file, lists all URLs the website owner wants search engines to crawl. Search engines promised years ago they would crawl anything listed in the Sitemap or RSS files and that we no longer needed Ping the search engine. Instead the search engines would magically do the right thing using data in those files.

IndexNow is a new protocol and is roughly like Ping. It is a modern implementation of that idea, and has authentication that should reduce or eliminate the possibility of misuse.

What is IndexNow

To be more precise about IndexNow's purpose, the website has this to say:

IndexNow is an easy way for websites owners to instantly inform search engines about latest content changes on their website. In its simplest form, IndexNow is a simple ping so that search engines know that a URL and its content has been added, updated, or deleted, allowing search engines to quickly reflect this change in their search results.

Without IndexNow, it can take days to weeks for search engines to discover that the content has changed, as search engines don’t crawl every URL often. With IndexNow, search engines know immediately the "URLs that have changed, helping them prioritize crawl for these URLs and thereby limiting organic crawling to discover new content."

In other words, IndexNow is a throwback to the old Ping protocol, but with security improvements and a better implementation. The benefit is reducing the time for website updates to show up in search results.

How were we supposed to learn about IndexNow

All that is great, but to implement it we must first know it exists. I literally learned about it by accident. I saw no notifications about IndexNow from anyone. Instead, I literally stumbled across it while looking for something else. How were we supposed to have heard about IndexNow? How many website owners do not know about IndexNow? Will their websites suffer by not implementing IndexNow?

While trying to figure out why Bing (and therefore DuckDuckGo) is not indexing one of my websites, I noticed in the Bing Webmaster Tools a tab labeled IndexNow. That tab claimed to me that IndexNow is a new standard for submitting website URLs to search engines, and we should start using it. In other words, because I happened to be using the Bing service, I just happened to see a mention of IndexNow.

My first response was to groan, Oh No, another gizmo to learn about. Are the search engines now ignoring Sitemap files, and are only looking at data reported through IndexNow? Am I going to have to suddenly learn a complex new thing just so my websites can stay relevant? Going further out on a limb, I wondered whether this was a secret plan of search engines to make individually-owned websites run by non-technical folks to rank even lower in the search results.

Fortunately the reality of IndexNow is not like that at all. Instead, it is a rather interesting tool, and its not that complex.

Advantages of using IndexNow

The current best practices are centered on the Sitemap file. But, we're told by IndexNow proponents that search engines end up repeatedly retrieving the Sitemap file just in case there is a change. In many cases there are no changes, meaning unnecessary website traffic and unnecessary computation for both the search engine and the website. Search engines also repeatedly retrieve individual pages with similar results. Hitting a URL whose content has not changed is wasted website traffic. That waste affects the bandwidth costs, and hardware costs, for both the search engine and the website owner. There is both a monetary impact (buying more servers and bandwidth) as well as environmental impact from these HTTP requests for pages or Sitemap files which may not have changed.

With IndexNow, the website directly notifies search engines about changes, the search engine can retrieve just the changed pages, reducing the number of wasted page retrievals, while quickly updating SERP listings.

The Search Engine Journal Show (on YouTube) posted an interview with Fabrice Canel, a product manager at Microsoft's Bing, describing everything there is to know about IndexNow. This interview settled all the questions I named above, and discussed several positive benefits of the IndexNow protocol. He also said that we can, and should, continue publishing a Sitemap file. The search engines will continue to query for that file just as now. This means IndexNow implementation is optional.

An example mentioned in the interview is changing the price on a product sales page. Since search engines today have an area to display price comparisons (something I don't think they should do, but that's for another day), it's useful to update the price data as it is changed on the website. Otherwise, the search engine might show an out-of-date price and cause confusion. By informing search engines that the product listing page has changed, it can crawl that one page, notice changes (such as the price) and update its index.

IndexNow is simple

IndexNow is very easy to implement. There are two HTTP operations depending on whether you are submitting one URL or many. A simple tool I'll discuss later took less than one day to implement, for example. The two HTTP queries have these purposes:

Submitting a single URL
Submitting multiple URLs

Each of these are easy, and can be implemented at the command line using tools like curl or wget. Authentication does not require registering with the search engine, lowering the barrier to entry. Instead, you generate on your own a simple key, which could be as easy as this:

$ uuid
ca60c1dc-f034-11ec-8f94-d7fa725bb712

This command generates a simple string of random text that is promised to be unique. It happens to match the format requirement for IndexNow keys.

You then deploy that key to a file that must be hosted on your website, and supply that key with every requests. The search engine verifies website ownership by retrieving the key file and verifying that the keys match.

Protocol details are in the documentation on the IndexNow website.

For Wordpress, there are at least two plugins implementing the IndexNow API. If you use CloudFlare, there is a new feature called Crawler Hints that is promised to include the IndexNow protocol. Other website publishing platforms have added support for IndexNow, or will do so in the due course of time.

Simplifying IndexNow using `indexnow-submit`

In my case, I use a self-created static website generator platform I called AkashaCMS. Therefore my websites do not have any dynamic code running on the server, since I deploy HTML/CSS/JS files to the server. That meant having a command-line tool for submitting URLs using IndexNow.

I have created a Node.js CLI tool, indexnow-submit, which handles three use cases for submitting URLs to search engines. The package is available through the npm repository at: https://www.npmjs.com/package/indexnow-submit

The targeted use cases are:

A single URL to submit to a search engine
Submitting multiple URLs to a search engine
Submitting URLs from an RSS or Atom feed to a search engine

It also contains the ability to retrieve a Sitemap file. The Sitemap or RSS/Atom feeds are sources from which to get URLs of content on your site.

First, generate a key file. As a convenience, the indexnow-submit package has a shell script containing this:

# This is one way to generate a suitable key file for IndexNow.
# The protocol says the key must be between 8 and 128 characters,
# and be hexadecimal digits or dashes.  That fits the format
# for UUID's.  Since UUID's are guaranteed to be unique, it is
# a convenient and easy way to generate an IndexNow key.

KEY=`uuid`
echo ${KEY} >${KEY}.txt

This of course requires that you're using a UNIX-like system (macOS, Linux, etc), and have both Bash and uuid installed. There is nothing magical about using uuid, because the key can be any text containing hexadecimal digits and dashes.

It is run like so:

$ sh -x genkey.sh 
+ uuid
+ KEY=56ea19dc-f044-11ec-93c0-7f6c2c57f6b3
+ echo 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3

$ cat 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3.txt 
56ea19dc-f044-11ec-93c0-7f6c2c57f6b3

Then copy the file to the root directory of your website on your server, satisfying the server end of the authentication.

$ scp 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3.txt USER@WEB-SERVER:/path/to/docroot

Maybe you can use the scp command as shown here, and that's all which is required. But most websites have a more involved deployment strategy. Typically there will be a source code management system, and some kind of deployment process.

With my AkashaCMS-based websites, I store the website source files in a Git repository, and after rendering the website on a local machine I use rsync to deploy to the public web server. In other words, the process might be:

Copy the key file into the directory containing website source files, making sure the file will be deployed when the site is deployed
Use Git to commit the key file to the repository
Run the build process to render the website
Deploy the website to the server (rsync?)

That will give you a copy of the key file in your local directory, and on the website.

Once the key is deployed to your server, it is ready to be used for IndexNow queries. To use a tool like wget to submit a single URL using IndexNow requires this:

$ wget "http://SEARCH-ENGINE/indexnow?url=URL-ENCODED&key=56ea19dc-f044-11ec-93c0-7f6c2c57f6b3"

This is the IndexNow protocol for submitting a single URL. If submitting multiple URL's, you construct a JSON document listing the URL's, then do a POST request. The protocol is documented on the IndexNow website.

Note that the IndexNow requests can be executed from any computer, and do not have to be executed from the web server. The request is authenticated by comparing the key provided in the query, with the key stored on the website.

Using indexnow-submit makes it this easy:

$ npx indexnow-submit submit-single URL \
        --engine SEARCH-ENGINE-DOMAIN \
        --key-file 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3.txt \
        --key 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3

You use either --key-file or --key depending on your preference. but not both. The --key-file option lets you use the key file as shown earlier.

If you have multiple URLs in a text file, the indexnow-submit command is this:

$ npx indexnow-submit submit-from-urls FILE-NAME \
        -e bing.com \
        -h SITE-DOMAIN-NAME \
        --key-file 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3.txt \
        --key 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3

The file containing URLs must be a plain text file, with one URL per line, and no extraneous text beyond the URL. Again, use either --key-file or --key depending on your preference, but not both.

If you want to submit from an RSS feed, run this:

$ npx indexnow-submit submit-from-feed https://SITE-DOMAIN-NAME/rss.xml \
        -e bing.com \
        -h SITE-DOMAIN-NAME \
        --key-file 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3.txt \
        --key 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3

The RSS feed is typically the most recent postings from a site, which is almost certainly the URLs we want the search engine to focus on.

Focusing precisely on changed content when submitting URLs with IndexNow

Clearly, the search engines will prefer that we submit only the changed content. In the RSS example, the top one or two entries are likely to be the changed content. The remaining entries are likely to have been changed at an earlier time. What will the search engines do if you repeatedly submit a URL that hasn't changed?

What the IndexNow FAQ says is:

Can I submit the same URL many times a day? Avoid submitting the same URL many times a day. If pages are edited often, then it is preferable to wait 10 minutes between edits before notifying search engines. If pages are updated constantly (examples: time in Waimea, Weather in Tokyo), it’s preferable to not use IndexNow for every change.

The FAQ repeatedly says similar things, that we should only submit URLs for content which has changed.

How you determine which URLs have changed will depend on how your content management system works. There are two simple approaches built into indexnow-submit:

Selecting Sitemap items by age
Selecting RSS/Atom items by age

Both Sitemap and RSS/Atom items have a date string indicating when the item was created. This means we can filter those items for ones less than a certain age.

For this purpose indexnow-submit offers the --max-age option for both sitemap-fetch and submit-from-feed. This option lets you give an ISO8601 duration specifier which are used to select items less than the specified age.

For example:

$ npx indexnow-submit submit-from-feed https://SITE-DOMAIN-NAME/rss.xml \
        -e bing.com \
        -h SITE-DOMAIN-NAME \
        --key-file 56ea19dc-f044-11ec-93c0-7f6c2c57f6b3.txt \
        --max-age P1D

This will include items less than one day old. Some example duration specifiers are:

P10D -- 10 days
P2M -- 2 months
PT1H -- 1 hour
PT10M -- 10 minutes

The sitemap-fetch has the same option, letting you select the relevant Sitemap entries.

Search engines that support IndexNow

According to the IndexNow FAQ, the current list of supported search engines are:

Bing -- bing.com
Seznam.cz -- seznam.cz
Yandex -- yandex.com

The --engine domain name to use is as shown here. Additionally you can use api.indexnow.org and it will automatically forward your information to all participating search engines.

Many search engines (like Yahoo or DuckDuckGo) use search data from Bing in search results. Therefore, updating Bing will automatically update such search engines.

There is a certain search engine that is conspicuously absent from this list. You know which search engine it is. There are reports this 80000 pound gorilla of a search engine is "testing" IndexNow, and will implement the protocol when it decides to. And, by the way, they remind us that they already do good things around optimizing crawl frequency.

Example process implementation

How you utilize indexnow-submit in your workflow is up to you. This is adapted from one of my websites, and is offered as an example. The website build process is managed using scripts in a package.json file executed by the npm tool which comes with the Node.js platform.

The build script runs a command which renders website files into the HTML/CSS/JS for the site. The key file is included in a directory that becomes part of the rendered website.

The deploy script is structured as a multi-step process like so:

"deploy": "npm-run-all upload-site indexnow",
"upload-site": "cd out && rsync --archive --verbose ./ USER-NAME@SERVER-DOMAIN:WEB-ROOT-DIRECTORY/",
"indexnow": "npm-run-all indexnow:bing indexnow:seznam indexnow:yandex",
"indexnow:bing": "npx indexnow-submit submit-from-feed ...",
"indexnow:seznam": "npx indexnow-submit submit-from-feed ...",
"indexnow:yandex": "npx indexnow-submit submit-from-feed ...",

The npm-run-all command is equivalent to npm run upload-site && npm run indexnow, meaning it runs the named series of NPM script commands.

Therefore the deploy script first uses rsync, then executes the three indexnow scripts. Each of those use the submit-from-feed command as shown earlier.

Summary

With indexnow-submit there is an option to use IndexNow even if your web publishing platform does not support the protocol. While it was designed with static website generators in mind, it can be used against a a website built using a regular dynamic content management system. For example, during development testing was conducted against a Wordpress site.

There is nothing to fear about IndexNow. We can continue using Sitemap's and everything will be the same as always. What IndexNow does is to target updates to specific content. If it lives up to the hype, SERP listings will be more up-to-date, which is an excellent thing.

About the Author(s)

David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.

How to Boot a Raspberry Pi 4 or 400 from a USB drive An Introduction to OpenADR - the Automated Demand/Response protocol