flag of the United Kingdom
URBAN
Mainframe

Auto-linking URL's

Date:  Wed, 21st-Apr-2004prevnext

Tags: CMS, Open Source, Perl, Programming, Urban Mainframe, Website Development

As more and more websites are driven by applications rather than static files, links are becoming increasingly complicated. I have noticed that the REGEX I use for identifying URL's is failing with increasing freqency. So I went in search of a replacement...

I posted a link into my Urbex forum ealier today. It's a relatively simple, harmless looking URI - http://projectz.org/index.php?cat=3 - but Shapeshifter's parser failed to successfully convert the link to a hyperlinked one. I wasn't too bothered, I hoped on over to TinyURL.com and generated a shorter link:  http://tinyurl.com/yvsq3. Again, the parser failed!

In the parser, a small RegEx was used to perform URL link conversion (documenting the RegEx is left as an exercise for the reader):

s/(((ht|f)tp):(\/\/)[a-z0-9%&_-+=:@~#/.?]+(\/|[a-z]))/<a href="$1" title="go to: $1">$1<\/a>/isg

This RegEx has served me well. I have used it for years, in various web applications, to convert plain-text addresses into hyperlinks. It always worked and I never had the need to change it. But today's application-generated URI's have proved to be too much for it.

Rather than construct a new RegEx myself, I did a little googling and found what appears to be the authoritive RegEx for URL recognition. Reviewing the documentation, it seems this RegEx will identify every legally-encoded URI possible. But just look at the thing! Surely there's something less imposing?

I then searched CPAN and found the "URI::Find" module. I quote, "This module does one thing: Finds URIs and URLs in plain text. It finds them quickly and it finds them all." Perfect, I'll buy it!

I installed URI::Find and loaded Shapeshifter's parser into my editor. However, I found myself struggling with the implementation until I found an example of URI::Find usage.

That article pointed the way and, with only a minor modification (to add a "title" to the hyperlink) became Shapeshifter's new URI finder:

require URI::Find;
my $finder = URI::Find->new( sub {
    my $uri    = shift;  # object representing the url
    my $string = shift;  # text that was in the url

    # return the replacement text, i.e. the same text
    # wrapped in <a href="..." title="..."> ... </a>
    return '<a href="' .
        $uri->abs. # get the absolute address
        '" title="go to: ' .
        $string.  # use the original text as the link title
        '">' .
        $string . # keep the original text
        '</a>';
    }
);
my $found = $finder->find($_[0]);

I've tested this locally with a variety of URI's from my bookmarks and it successfully identified and converted every one.

Problem Solved!

You can comment on this entry, or read what others have written (2 comments).


W3C VALIDATE XHTML
W3C VALIDATE CSS