flag of the United Kingdom
URBAN
Mainframe

Acronym Definitions

Date:  Sat, 24th-Jan-2004prevnext

Tags: CMS, Databases, Perl, Programming, Urban Mainframe, Website Development

Many websites are starting to use XHTML and CSS to offer definitions of acronyms and abbreviations with the <acronym> and <abbr> tags. I spend a little time teaching Shapeshifter an acronym or two...

I have been creating websites since 1996. It's strange then that I am only now beginning to consider using the <acronym> and <abbr> tags in my web-pages, especially since these have been available since at least HTML 4.

There has been some confusion as to what constitutes an acronym or abbreviation and the difference between the two, not to mention the little-used "initialism". I am not going to reiterate the arguments and definitions here (see: Craig Scalia's "HTML is not an Acronym" for more information). Furthermore, not all browsers support both tags. For example, MSIE ignores the <abbr> tag for some strange reason. This is a shame since these tags provide a rather neat and semantically structured way of defining acronyms (or abbreviations, or initialisms) and presenting those definitions to the user.

Since the use of at least the <acronym> tag seems to be becoming a de-facto standard these days, I decided to implement it on the Urban Mainframe.

The most obvious way to do this was to add a little code to Shapeshifter's parser to look for acronyms and apply the relevant markup. First though, I needed a new table in the database to accommodate the acronyms (for pattern matching) and their respective definitions:

CREATE TABLE acronyms (
    acronym VARCHAR(32) NOT NULL DEFAULT '',
    definition TEXT NOT NULL,
    PRIMARY KEY  (acronym),
    UNIQUE KEY acronym (acronym)
) TYPE=MyISAM COMMENT='Acronym Definitions';

I then added the following Perl code to Shapeshifter's parser:

# Get acronym definitions...
my $dbh = DBI->connect($dsn, $id, $password, %attr);
my $sth = $dbh->prepare(qq{SELECT * FROM acronyms ORDER BY LENGTH(acronym) DESC});
    $sth->execute();
    while (my $lookup = $sth->fetchrow_hashref()) {
        $_[0] =~ s/(\W|s)($lookup->{'acronym'})(\W|s)/$1<acronym title="$lookup->{'definition'}">$2</acronym>$3/;
        $_[0] =~ s/((title|alt)=".*?)<acronym title="$lookup->{'definition'}">($lookup->{'acronym'})</acronym>(.*?")/$1$3$4/g;
    }
    $sth->finish();
$dbh->disconnect();

As you can see, the code is rather simple ("$_[0]" is the content to be parsed):

We loop through the contents of the "acronyms" table. The "ORDER BY LENGTH(acronym) DESC" ensures that the longer acronyms are processed first so that we can safeguard against some nesting possibilities. Then we use Perl's RegEx engine to scan the page content. If it finds a match, it encapsulates the acronym in the appropriate markup. Logically, we only need to give a definition for the first instance of each acronym on any given page, so we omit the RegEx's global operator. As it stands, our parser will "break" by matching inside other HTML markup. The second RegEx checks for this and removes the added code if it has been placed within "title" or "alt" elements.

This code works pretty well but has two big flaws:

  • Our <acronym> can end up nested if we have two acronyms in our database where one is also present in another (ie: if we have definitions for "CD-ROM" and "CD" then we'll end up with nested <acronym> tags).
  • Since our parser only matches the first occurrence of any given acronym, if the first occurrence is within a "title" or "alt" element, it is subsequently removed by the second RegEx, thus leaving us with no definitions for any instance of the acronym.

If anyone can offer an alternative solution that addresses either, or both of these flaws then I'd appreciate it if you'd share the knowledge.

All that remained then was to add a little sprinkling of CSS to handle the presentation of the acronyms:

acronym { border-bottom: 1px dotted;  cursor: help; }

/* Hides from IE-mac */
* html acronym { border: 0; font-weight: bold; font-style: italic; }
/* End hide from IE-mac */

The effect of all this effort:

In Mozilla acronyms are displayed with a dotted underline, in MSIE they are displayed bold and italic. Both browsers then display a tooltip containing the definition of an acronym when the mouse-pointer hovers over it, and the pointer changes to the help pointer.

Best of all though, the Urban Mainframe benefits from a little more accessibility.

You can comment on this entry, or read what others have written (0 Comments).


W3C VALIDATE XHTML
W3C VALIDATE CSS