404 errors: The case of the invisible pages

Written on 04 September, 2012 by Uyen Vu
Categories: General Wholesale, Hosting | Tags: errors, online marketing, seo, webmaster tools

Hands up who checks their Webmaster Tools account regularly – keep your hands up if you check your 404 errors. Good. For those who don’t, start checking.

Google Webmaster Tools can tell you a lot about your website, how it is being indexed in Google, how people are interacting with your website and what they are searching for to arrive at your website. We could write a novel about the usefulness of Webmaster Tools, but this article is going to be about those dreaded 404 errors.

Essentially 404 errors are an HTTP standard response message returned by the browser when the requested web page is not available. Finding 404 errors on your website is completely normal, new content is always created and old content removed.

Within Google Webmaster Tools, you are able to find ‘crawl errors’ under your ‘health’ tab on the left hand side. This area will list various types of crawl errors:

  1. Server error – ‘requested timed out or site is blocking Google’
  2. Soft error – ‘URL does not exist, but your server is not returning a 404 error’
  3. Access denied – ‘Server requires login or is blocking Googlebot’
  4. Not found – ‘URL points to an nonexistent page’
  5. Not followed – ‘URL has active content or there was a problem with redirects’
  6. Other – ‘Google unable to crawl the URL’

So the usual routine here is to perform 301 redirects for any URLs which have moved or had their links changed, or for pages you no longer require, and ‘mark as fixed’. But how about links which appear as a 404, but which have never actually existed on your website?

If Google finds a URL that points to your domain name anywhere on the internet, it will try and crawl that link no matter if it exists or not. Your server should return a 404 error if that URL never existed. Don’t be tempted to do a 301 redirect for these. By redirecting them, you are validating them to Google. If the page never existed, then the correct response code is 404 – not 301.

There are lots of different ways these links could be caused. For example:

  1. A typo in a link to your website
  2. Misconfiguration, if they were automatically generated, e.g. from a Content Management System (CMS)
  3. URLs embedded in JavaScript or other embedded content
  4. Text URLs within an page, like a link list or descriptive URL - these do not even need to be complete URLs

The last type, text links within a page, causes the most hassle if you like to keep your reported errors under control. Many forums add an ellipsis to shorten URLs, such as http://www.yourdomain.com/yourp… or http://www.yourdomain.com/yourp…index.html. Google has started attempting to crawl these URLs, even when they are not in anchor tags.

These can occur on your own site as well. At Netregistry, we use a type of spam protection on our forms which splits the action URL in half, then joins it back together with JavaScript. This foils spam bots, but the Google bot still tries to crawl the part-URLs, creating a 404 when none really exists.

As stated within Google Webmaster Central blog – “We don’t know which URLs are important to you vs. which are supposed to 404, so we show you all the 404s we found on your site and let you decide which, if any, require your attention.” - http://googlewebmastercentral.blogspot.com.au/2011/05/do-404s-hurt-my-site.html

So it is safe to say you can ignore them, or in other words, ‘mark as fixed’. But, you might ask, if you haven’t actually done anything about them, won’t they just come back? Yes and no.

By trying to redirect these non-existant URLs, you’re declaring to Google that they are valid errors – that there was a page there, but now there is not. That you have made a change or a mistake, and will fix it with a 301 redirect. However, by ignoring them you’re essentially telling Google that they got it wrong and these URLs do not need to be fixed. They may come back in the short term, but eventually they should drop off.

To sum it all up, maintaining your Webmaster Tools error list is important. Do not let your 404 errors pile up – nobody likes an error graph that rises and rises every day. Instead, perform regular checkups, redirect those important URLs (remember to use 301 redirects) and safely ignore those that do not need it.

If you would like to learn more about these 404 errors and how to deal with it please do not hesitate to contact us on 1300 885 884