A Better 404 - Redux
A couple of months ago I wrote about how I’d modified my WordPress 404 page to be a little bit more useful and informative to any reader unlucky enough to encounter it. Amazingly that article was deprecated almost as soon as I’d published it as I had continued to refine and supplement the code I’d described there. Furthermore, I had added some more functionality to the page for better “possible match” suggestions. So in this article I’m revisiting the custom 404-page to describe the changes I’ve made since the previous installment.
You see, one thing I realised was that if a link to a missing resource came in from a search-engine, I would have a referrer string that included the keyword(s) or phrase that had been submitted to that search-engine, since they almost all include that data in their URIs. Surely then, I should be exploiting that fact to the benefit of my visitor?
So my 404.php file begins with the standard message that all visitors will see:
<p>Sorry, but <em><? echo $_SERVER['REQUEST_URI']; ?></em> doesn't exist on the <a href="/" title="go to: the home-page..."><strong>Urban Mainframe</strong></a> at this time. If it once existed here then it may have been moved, renamed or deleted.</p>
Then, if there is a referrer, we inform the user that the inbound link is incorrect:
<? if ($_SERVER['HTTP_REFERER']) { echo "<p>The link at <em><a href="" . $_SERVER['HTTP_REFERER'] . "">" . $_SERVER['HTTP_REFERER'] . "</a></em> is incorrect.</p>"; } ?>
This is followed with a little more helpful information. Then we get to the real magic of the page. We want to be able to offer the reader a list of possible matches if possible, based on the URI they have requested, or the keyword(s) or phrase they submitted to a search-engine. We do this by extracting keywords from our referrer URI and performing an internal, full-text search on our content with those keywords.
We start by breaking the URI of the requested (erroneous) page into its component parts (ditching the domain name and TLD and splitting the remainder of the URI on the “/” character). Each URI slug is then appended to a string called “$keyword”, separated with a white-space. We do this for two reasons:
- It’s possible that the URI, while not directly matching a resource, still has enough information in it for us to “guess” at what the visitor is looking for.
- There might not be a referrer, or the referrer might not be a search-engine.
This means that if our search-engine referrer test returns nothing then we still might have a keyword or two to perform our internal search with.
NOTE: The “$keyword” variable will thus always be populated. If there is a match on the external search-engine referrer test, then the “$keyword” variable will be replaced with the results of that test.
$keyword = substr($_SERVER['REQUEST_URI'],1);
$keyword = urldecode(stripslashes($keyword));
$keyword = str_replace('/',' ',$keyword);
Then we check to see if the referrer is one of the search-engines we know about and, if it is, we extract the keyword(s) or phrase that the search-engine was processing. We then make our “$keyword” variable equal whatever the “query” variable is for a given search-engine. For Google that “query” variable is called “q”, for Lycos it’s “query”, for Yahoo it’s “p” and so on.
$ref = $_SERVER['HTTP_REFERER'];
if ( preg_match("#(google|msn|live|altavista|alltheweb|scirus)#si", $ref) ) {
$s = explode("?",$ref);
parse_str($s[1]);
$keyword = $q;
} elseif (preg_match("#(aol|vivisimo|lycos|aliweb)#si", $ref) ) {
$s = explode("?",$ref);
parse_str($s[1]);
$keyword = $query;
} elseif (strstr( $ref,'yahoo')) {
$s = explode("?",$ref);
parse_str($s[1]);
$keyword = $p;
} elseif (strstr( $ref,'baidu')) {
$s = explode("?",$ref);
parse_str($s[1]);
$keyword = $wd;
}
For those of you who don’t speak PHP there’s a couple of things that need a little explanation here. “preg_match” performs a regular expression (RegEx) match against a string, “explode” splits a string on a delimiter (in our case the “?” character that precedes the “query” of a URL) and “parse_str” parses a string into key/value pairs (in our case the string returned by the “explode” function).
With our “$keyword” now defined, we then pass this string into our internal search mechanism and hopefully we’ll get some “Possible Match(es)” that we can offer the reader:
$limit=10;
$len=25;
$before_title = '<li>';
$after_title = '</li>';
$before_post = '';
$after_post = '';
$show_pass_post = false;
$show_excerpt = false;
global $wpdb, $post;
// Make sure the post is not from the future
$time_difference = get_settings('gmt_offset');
$now = gmdate("Y-m-d H:i:s",(time()+($time_difference*3600)));
// Primary SQL query
$sql = "SELECT ID, post_title, post_content,"
. "MATCH (post_name, post_content) "
. "AGAINST ('".mysql_escape_string($keyword)."') AS score "
. "FROM $wpdb->posts WHERE "
. "MATCH (post_name, post_content) "
. "AGAINST ('".mysql_escape_string($keyword)."') "
. "AND post_date <= '$now' "
. "AND (post_status IN ( 'publish', 'static' ) && ID != '$post->ID') ";
if ($show_pass_post=='false') { $sql .= "AND post_password ='' "; }
$sql .= "ORDER BY score DESC LIMIT $limit";
$results = $wpdb->get_results($sql);
$output = '';
if ($results) {
foreach ($results as $result) {
$title = stripslashes($result->post_title);
$permalink = get_permalink($result->ID);
$post_content = strip_tags($result->post_content);
$post_content = stripslashes($post_content);
$output .= $before_title .'<a href="'. $permalink .'" rel="bookmark" title="Permanent Link: ' . $title . '">' . $title . '</a>' . $after_title;
if ($show_excerpt=='true') {
$words=split(" ",$post_content);
$post_strip = join(" ", array_slice($words,0,$len));
$output .= $before_post . $post_strip . $after_post;
}
}
echo "<h3>Possible Match(es)</h3>";
echo "<ol style="margin-left: 40px;">";
echo $output;
echo "</ol>";
}
For more on the above code please refer to the original source.
As in my original article, I must stress that in order for the full-text search to work you must have full-text indexing configured on your WordPress database. The following SQL command is all you need to enable this:
ALTER TABLE 'wp_posts' ADD FULLTEXT 'post_related' ('post_name' ,'post_content');
That’s all there is to it! For very little code we can provide a useful and informative 404-page to our visitors. Hopefully very few will see it but, for those that do, we’re giving just a few more hints and nudges in the right direction than most websites. Our website is more user-friendly and that’s always worth making a little extra effort for.
For your reference, I’m making the full source code of my 404.php file available. If you use it I’d appreciate a credit and link - but that’s certainly not obligatory. Now go and fix up your 404-page and let’s make the Web a better place!
[…] the URI into a search query is very simple. If you would like a more advanced please refer to “A better 404 - Redux” at Urban Mainframe, where Jonathan Hollin expounds on his (downloadable!) 404 page […]
Brilliant idea! Thank you for sharing.
(To be able to run the ALTER TABLE statement in phpMyAdmin, replace the ’ with ` - like in the original post.)