This document is available on the Internet at: http://urbanmainframe.com/folders/blog/20040923/folders/blog/20040923/
My CMS periodically checks all links in my content to ensure that the resources they point to are actually available.
My software uses HTTP::Response to check the status of each link. If the response code is in the "2xx" range then the link is okay, if it's "4xx" or "5xx" then the link is bad. When the link-check process is completed, the CMS emails me a report listing any broken links and their status codes. If there are no broken links, then the CMS simply emails me to report that it completed a link-check with a record of the date and time when the test was performed.
Therefore, there should never be a broken link on the Urban Mainframe.
That's the theory anyway. Unfortunately, as I was clicking through some of the older entries in the Link Archives this evening, I actually found a bad one...
“any device that requests the status of a resource is blatantly misinformed”
The resulting 404 error page caught me completely by surprise, especially since each link's status is displayed in the archive listing and this particular link was described as "Okay".
I manually ran the link checker and waited for the email that would list the broken link - but it never came! Instead, I got the "all okay" email!
The obvious conclusion was that there was a bug in my link checker. Now I enjoy debugging as much as the next masochist, so I convinced myself that the fault was with Yahoo! (the link's destination) rather than my CMS. So I checked the rogue URL with Rex Swain's HTTP Viewer (an extremely handy tool), with interesting results.
The interesting part reads, "
For those of you who don't speak geek: the Yahoo! web-server/CMS can't find the
resource I have requested (HTTP/1.1 status 404), yet it tells the client browser
that it has (HTTP/1.1 status 200)! Therefore, any device that requests the status
of a resource from
story.news.yahoo.com is blatantly misinformed.
It's commendable that Yahoo! are offering a customised and otherwise useful 404 page, but somebody should configure the system to correctly return a status of "404" with that page, rather than the erroneous "200".