flag of the United Kingdom

Latest Addition: The Dynamic Blogroll

Date:  Fri, 28th-May-2004prevnext

Tags: Announcements, CMS, Databases, Perl, Programming, Urban Mainframe, Website Development

We all know that blogs thrive on linkage and mine is no different to any other. Thus, I have added a dynamic blogroll to the weblog...

Defining the Problem

Some of my long-standing readers will remember that I had a blogroll once before, which subsequently disappeared. That blogroll was simply a list of links that were hard-coded into my template. It therefore exhibited two annoying characteristics: the template had to be manually edited for every revision and the list wasn't dynamic. So I killed it, promising myself that it would be reincarnated when I had overcome those two issues.

The task of writing a blogroll handler was then added to my list of things-to-do. However, it didn't have a high priority for me, so it kept getting pushed further and further down that list - until today!

In writing the handler I kept my two prerequisites in mind:

  • I must be able to add and remove links quickly and easily
  • the list must update itself with a reasonable degree of frequency

It seemed simple enough but, once I actually began to flowchart the logic of the handler, I realised that my requirement for dynamism would demand an inordinate amount of bandwith and processing. Consider the following process:

  • retrieve the blogroll list from the RDBMS
  • retrieve the rss feed of each listed weblog - which, for a large blogroll, suggests significant delays (the handler must also be able to gracefully cope with a website that is down)
  • parse the XML, extracting the timestamp of the latest revision of each
  • order the output list accordingly

To run this process for each relevant page request would obviously be utter madness, so I would also have to write some sort of caching system for the blogroll. Oh I love a challenge!

The Solution (Almost)

There's nothing groundbreaking about a dynamic blogroll. Lots of weblogs have them (Ryan Brill, Dunstan Orchard, Simon Willison...). So I turned to Google, knowing that someone, somewhere, would have documented a procedure. I quickly found a solution published by Paul Hammond, complete with source code.

In the end, I didn't use Hammond's blo.gs.pl script. But I did take something from his methodology.

blo.gs.pl is a script to grab a list of recently updated weblogs from blo.gs and display them on your site.

The significance of the above was not lost on me. The facilities provided by blo.gs would take care of the first three steps of the process I described earlier. Furthermore, I could use blo.gs to maintain my blogroll list too!

blo.gs generates a dynamic blogroll, on request, and in a variety of formats for its users. So I set up an account, added the weblogs I wanted to list and ended up with a raw blogroll.

Using blo.gs to do all the "heavy lifting" meant that my handler could be greatly simplified:

  • retrieve the raw blogroll from blo.gs with LWP-Simple
  • parse the date, extracting the title, URL and timestamp of each item
  • apply time_since() to the timestamps
  • add a dash of XHTML and CSS for presentation
  • bake, in the middle of the oven, for 20ms at gas-mark 7

I coded up this process and tested. Everything worked well, with one caveat: blo.gs was sometimes slow to respond to my HTTP GET request. Of course, this should be expected. We've already discovered that there's a lot of work involved in keeping these lists up-to-date and (at the time of writing) blo.gs, bless her, is processing "1,850,062 blogs for 6,605 users." So I could forgive her the delays.

This presented me with something of a dilemma however. I had a solution that worked. I had simplicity, efficiency and ease-of-management. I didn't have performance. I have worked hard to optimise Shapeshifter for fast page generation and delivery. I pride myself on the fact that, 99.99% of the time, the Urban Mainframe's pages are rendered in the sub-0.50s range. All my efforts were being mocked with my new blogroll!

Cache to the Rescue

I decided to implement a simple caching system for the blogroll, it works like this (for each weblog page request):

  • retrieve the cached blogroll from the RDBMS if one exists and it is less than one-hour old
  • if we have successfully retrieved a blogroll from the cache, update the time_since() placeholder for each item, output then exit, else
  • if we haven't retrieved a blogroll, request the raw file from blo.gs and process it - but do not apply the time_since() function to the timestamps (as this will only be accurate if it is applied for each page request)
  • write the processed file into the cache and timestamp it, then exit

This process means that I can cache the blogroll yet still have a valid "time_since" record for each item.

Are you wondering how, "retrieve the cached blogroll from the RDBMS if one exists and it is less than one-hour old," is implemented? I use the power of SQL to conditionally "select" the cached item based on its age:

SELECT token, timestamp FROM cache WHERE label = "cached_blogroll" AND date_sub(now(), interval 1 hour)<timestamp

Cool eh?


The blogroll is output as a simple, unordered list which is then styled with CSS. The "last revision" timestamp for each link is displayed, in time_since() format, in the link's "title" tag (hold the mouse-pointer on a link to reveal the timestamp). Similarly, the "updated hourly" text in the panel header will reveal the age of the cached blogroll.

So there you have it. A neat, dynamic blogroll that's simplicity itself to manage. Damn, I'm good! :-)

You can comment on this entry, or read what others have written (8 comments).