This document is available on the Internet at: http://urbanmainframe.com/folders/blog/20040124/folders/blog/20040124/
|
Tags:
Many websites are starting to use XHTML and CSS to offer definitions of acronyms and abbreviations with the <acronym> and <abbr> tags. I spend a little time teaching Shapeshifter an acronym or two...
I have been creating websites since 1996. It's strange then that I am only now beginning to consider using the <acronym> and <abbr> tags in my web-pages, especially since these have been available since at least HTML 4.
There has been some confusion as to what constitutes an acronym or abbreviation and the difference between the two, not to mention the little-used "initialism". I am not going to reiterate the arguments and definitions here (see: Craig Scalia's "HTML is not an Acronym" for more information). Furthermore, not all browsers support both tags. For example, MSIE ignores the <abbr> tag for some strange reason. This is a shame since these tags provide a rather neat and semantically structured way of defining acronyms (or abbreviations, or initialisms) and presenting those definitions to the user.
Since the use of at least the <acronym> tag seems to be becoming a de-facto standard these days, I decided to implement it on the Urban Mainframe.
The most obvious way to do this was to add a little code to Shapeshifter's parser to look for acronyms and apply the relevant markup. First though, I needed a new table in the database to accommodate the acronyms (for pattern matching) and their respective definitions:
CREATE TABLE acronyms (
acronym VARCHAR(32) NOT NULL DEFAULT '',
definition TEXT NOT NULL,
PRIMARY KEY (acronym),
UNIQUE KEY acronym (acronym)
) TYPE=MyISAM COMMENT='Acronym Definitions';
I then added the following Perl code to Shapeshifter's parser:
# Get acronym definitions...
my $dbh = DBI->connect($dsn, $id, $password, %attr);
my $sth = $dbh->prepare(qq{SELECT * FROM acronyms ORDER BY LENGTH(acronym) DESC});
$sth->execute();
while (my $lookup = $sth->fetchrow_hashref()) {
$_[0] =~ s/(\W|s)($lookup->{'acronym'})(\W|s)/$1<acronym title="$lookup->{'definition'}">$2</acronym>$3/;
$_[0] =~ s/((title|alt)=".*?)<acronym title="$lookup->{'definition'}">($lookup->{'acronym'})</acronym>(.*?")/$1$3$4/g;
}
$sth->finish();
$dbh->disconnect();
As you can see, the code is rather simple ("$_[0]" is the content to be parsed):
We loop through the contents of the "acronyms" table. The "ORDER BY LENGTH(acronym) DESC" ensures that the longer acronyms are processed first so that we can safeguard against some nesting possibilities. Then we use Perl's RegEx engine to scan the page content. If it finds a match, it encapsulates the acronym in the appropriate markup. Logically, we only need to give a definition for the first instance of each acronym on any given page, so we omit the RegEx's global operator. As it stands, our parser will "break" by matching inside other HTML markup. The second RegEx checks for this and removes the added code if it has been placed within "title" or "alt" elements.
This code works pretty well but has two big flaws:
If anyone can offer an alternative solution that addresses either, or both of these flaws then I'd appreciate it if you'd share the knowledge.
All that remained then was to add a little sprinkling of CSS to handle the presentation of the acronyms:
acronym { border-bottom: 1px dotted; cursor: help; }
/* Hides from IE-mac */
* html acronym { border: 0; font-weight: bold; font-style: italic; }
/* End hide from IE-mac */
The effect of all this effort:
In Mozilla acronyms are displayed with a dotted underline, in MSIE they are displayed bold and italic. Both browsers then display a tooltip containing the definition of an acronym when the mouse-pointer hovers over it, and the pointer changes to the help pointer.
Best of all though, the Urban Mainframe benefits from a little more accessibility.