Clean Sweep - Fix All Your 404s and Moved Content Easily

Bookmark and Share
| | Comments (2)

Whether you want to make a minor change to your site's mappings or have moved a well indexed site from another platform to MT, you will run into niggly issues with 404s at some point.

While it's relatively easy to catch some of them ie. the most popular pages on your site for example, it's painful trying to work out exactly what everyone is trying to see. It doesn't matter which software you use to parse your server logs - you'll still end up with a lot of unusable data that will take hours to plough through.

Enter Clean Sweep.

Clean Sweep's sole purpose in life is to catch 404s. Not only does it log them in an easy to read format, including how many hits there are and when the last one was, but it also offers you a lot of control over the 404s themselves.

In order to use CleanSweep you will need to have some level of access to your Apache config OR you might be able to get it to work via .htaccess. I haven't tried via .htaccess, so I cannot confirm if that method will work or not.

First off you will need to get your hands on the code. As the plugin is still "in development" the only way you'll be able to access it is using SVN, which is a very popular version control system. Don't let the term "version control" scare you. I wouldn't know where to begin with SVN, but I do know how to "check out" (ie. download) code from an SVN repository. I'm sure you will be able to work that much out too!

Once you have the code install the plugin as normal.

When you login to MT again you will be presented with an upgrade screen, as the plugin needs to create a new database table to store its data.

As the plugin description goes:

  • Clean Sweep is a plugin that helps administrators manage broken links on their blog, automatically correct the error if possible and then generate mod_rewrite rules to help correct the problem permanently.*

To do that you need to add a small bit of code to your Apache config to use CleanSweep to handle all the 404s from the plugins config screen.

You can also set a custom page to act as your 404 error page, maybe adding some text for human visitors telling them that the page has moved etc.,

Once you have everything configured wait a couple of hours (or less depending on how busy your site is) and then visit the new 404s page from the "Manage" dropdown menu.

You'll see a simple page with the 404s, the number of requests for each one, time of the last request and two options: "map" and "reset"

The "map" option is the one we are most interested in, as this allows to handle the redirection of both human and computer visitors

Find the new location of the entry or page in your site and then open up the "map" screen. You will be able to set the new location for each page that is causing a 404.

Once you are finished setting up your mappings for each 404 error click the "Generate Rewrite Rules" link and you will be given a nice little set of rewrite rules that you can add to your Apache config OR .htaccess file.

The plugin is really handy. I've been using it since early this morning and so far it's redirected several hundred visitors that I would have lost otherwise.

Of course it will also make you aware of some of the junk that is trying to get in to your site, as you will see lots of dodgy requests for pages that you really don't want people to access!

  • Currently 5/5
  • 1
  • 2
  • 3
  • 4
  • 5
Rating: 5/5(1 votes cast) digg| bookmark

Categories

, , ,

2 Comments

Byrne Author Profile Page said:

Michele,

Thanks for the great overview... I wanted to let you know that the plugin has been officially released.

But reading your write up made me think of a few new features that could be really useful. For all those "dodgey requests" that come in, I wonder if Clean Sweep could somehow somehow send those requests into a deadend, or return some other HTTP status code? I wonder also if some 404s could be be flagged as "ignore" so that the system could just let them through as normal.

In the end, I am seeing the usefulness for this plugin well beyond what I had originally intended. It is about maintaining a healthy web site more then anything. What other features do you think would be valuable?
I am just curious, what other features could be added to make this even more useful and valuable

Byrne

You could extend its uses by passing other HTTP codes to it as well :)

What would be useful is to be able to filter out things like the /trackback URIs, as they aren't of much use to anyone anyway..

If you really want to send the requests to a deadend you'd want to be using iptables :)

Being able to see which URLs people are trying to reach is interesting and being able to trap errors in a sane manner is excellent.

You *could* look into tracking excessive requests for certain resources and maybe do some kind of rate limiting, though to make that more useful you might want to track the UserAgent strings..

Michele

About this Entry

This page contains a single entry by Michele Neylon published on September 3, 2007 7:20 PM.

Posting to Your Blog From Your Desktop Using Ecto was the previous entry in this blog.

Authentication Plugins - Wordpress and AIM is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.31-en