This document is available on the Internet at: http://urbanmainframe.com/folders/blog/20040902/folders/blog/20040902/
D. Keith Robinson recently solicited the opinions of his readers with regards to the delivery of RSS feeds from the Asterisk website.
Robinson's initial article, subsequent follow-up and considerable user feedback have all served to solidify and vindicate some of my own opinions relating to the production and serving of RSS content...
Don't use a combined feed for all content types (articles, links, photographs, etc). Apply a little granularity by using seperate feeds for distinct data types. Remember: The user can subscribe to multiple feeds from your website as required.
If possible, you should categorise your content and offer individual feeds for each category. For example: Technology, Music, Literature, News, etc.
Of course, such a fine-grained approach might be impractical or incompatible with your content delivery mechanism. If this is the case, then consider prefixing the titles of each element with a tag that describes the data type.
[Link] Read, reference and comment - an online format that does it all
[Link] The world's two worst variable names
[Weblog] Gmail: Desirable or Not?
[Image] President Bush Greats France's Jacques Chirac
At least this way the user can see, at a glance, what each element in any given feed is - they know what to expect if they click through.
Element classification, whether expressed with distinct feeds or title prefixes, is also useful to those who are programmatically syndicating your content.
Some readers prefer content summaries, others prefer full text. As content producers, we want to satisfy both camps. However we worry that, if we deliver full text feeds to our readers, they will no longer need to visit our websites to enjoy the content that we labour to produce. Visitors are a website's raison d'être - without them, our ad revenues dwindle and our wonderful designs go unnoticed. Seems like justification enough to provide only summaries, doesn't it?
I always thought so - indeed, at the time of writing, full text feeds are not available from the Urban Mainframe. That's going to change...
Consider the reader who, for whatever reason, prefers a full text feed. I want readers to visit my website so I've elected to provide only content summaries. What is the aforementioned reader going to do? Is she going to shrug her shoulders and subscribe to my summaries despite her preference, or is she going to dismiss my website - never to return?
If the reader wants full text then the content producer must deliver it. Otherwise we are putting a barrier in front of the reader and it might just be one obstacle too many.
We can still get eyeballs on web-pages, if we provide incentives - if there is something on the website that can't be made available through content syndication. Obvious incentives include interactive components like forums, comment systems, user polls and ratings. Users have to visit to submit search queries or browse through archived content. All of which leads me to conclude that we won't lose visitors by providing full text feeds, but we might if we don't!
Content producers should be aware that some subscribers don't use RSS Readers to handle their subscriptions. They use web-based services like My Yahoo! or blo.gs. Thus it is important that the content producer "ping" (i.e. inform) these services when they publish new material.
CMS applications like WordPress and Movable Type have options to automatically ping various notification services when new content is published. The content producer should take full advantage of these facilities.
What if your CMS doesn't provide a pinging mechanism? The web-based Ping-O-Matic can ping up to 14 notification services on your behalf.
I've noticed that, from some of the websites I have subscribed to, I get repeated notifications of new content when, in fact, a visit to the website in question seems to yield no such thing. Alternatively, I see items drop into my RSS Reader that I have already read. What's going on?
These duplicates are caused by incorrect time-stamping of feed items. Either the content producer or their CMS is updating the "pubDate" field of an item when that item, or the content it describes, is modified (during post-publish edits). The RSS Reader (or notification service) uses the "pubDate" field to determine whether or not the relevant item is new (from the user's perspective). These modified items therefore register as new to the RSS Reader, hence the false notifications and duplicates.
This shouldn't happen. As I've written before, the RSS 2.0 Specification dictates that the "pubDate" field "indicates when the item was published", not when the item was modified.
If notifications of modified pages are deemed to be necessary, offer them as a seperate feed. Remember: Give the user selectivity.
That small, orange "XML" button means absolutely nothing to the average web user. When clicked, the button returns what appears to be garbage (with a few exceptions) - so, if anything, the average user is learning to AVOID the button!
This needn't be the case. Use XSLT (or even CSS) to apply styling to the feeds to make them human-readable. XHTML comments can be employed within the feed to explain to the user what it is and how it can be used with a RSS Reader. XSLT, CSS and XHTML comments do not have any effect on RSS Readers, they work equally well for RSS, RDF and Atom. The use of XSLT, CSS or XHTML comments does not compromise feed validation.
See Sean M. Burke's "Making RSS Pretty" for implementation details.
RSS is incredibly wasteful of bandwidth. Consider the following:
A fictional website is beginning to get popular, its unique visitor count increasing on a daily basis. The website's operators introduce a single XML feed which describes their ten most recently published articles with a brief summary of each. Let's say that, on average, the feed weighs in at 5KB.
The new feed quickly acquires 100 subscribers, who each use a desktop client to retrieve that feed every hour (a not uncommon configuration).
The content producers publish a new article every day.
This means that around 12MB of data is served every day (do the math) just to service the RSS facility and all for one new article! These numbers extropolate alarmingly if the feed weight increases (full-text anyone?) and as the number of subscribers climbs. They also look considerably worse if content production is less frequent.
As Charles Miller tells us, we can mitigate the RSS scalability problem with both the client and server.
The HTTP "last-modified" header, gzip compression and RSS "ttl" element can all be used on the server side.
As for the RSS client, these should adhere to the standards in that they check for, and respond accordingly, to the "last-modified" and "ttl" tags. Subscribers can also reduce bandwidth consumption by configuring their RSS Readers to poll their target websites less frequently.
Webmasters should also avoid providing unnessary RSS feeds: there seems to be a growing trend to provide feeds of user comments on weblogs for example. To do so is absolutely ridiculous and akin to throwing money down the drain. This is particularly vexing because it is completely unnecessary - email is eminently more suitable for such notifications as well as being accessible to a wider number of users.
This then is the road-map for the future development of RSS feeds on the Urban Mainframe and I hope that, at the very least, it provides food-for-thought for those of you who are producing RSS feeds of your own.
Some of the specifics discussed here are already implemented on this website, several of them aren't - but I'm working on it.
I'd like to thank D. Keith Robinson and his readers for their insights. Like Robinson, I am certain that content syndication will become more and more mainstream as awareness grows. After all, it's extremely hard not to appreciate the obvious benefits.
I do think there are some issues that still need to be addressed though: content syndication needs to be more widely advocated and the subscription process simplified. The delivery mechanisms need to be carefully considered too, so that the bandwidth demands might be reduced.
It's going to be an interesting ride.