flag of the United Kingdom

Auto-linking URL's

Date:  Wed, 21st-Apr-2004prevnext

Tags: CMS, Open Source, Perl, Programming, Urban Mainframe, Website Development

As more and more websites are driven by applications rather than static files, links are becoming increasingly complicated. I have noticed that the REGEX I use for identifying URL's is failing with increasing freqency. So I went in search of a replacement...

I posted a link into my Urbex forum ealier today. It's a relatively simple, harmless looking URI - http://projectz.org/index.php?cat=3 - but Shapeshifter's parser failed to successfully convert the link to a hyperlinked one. I wasn't too bothered, I hoped on over to TinyURL.com and generated a shorter link:  http://tinyurl.com/yvsq3. Again, the parser failed!

In the parser, a small RegEx was used to perform URL link conversion (documenting the RegEx is left as an exercise for the reader):

s/(((ht|f)tp):(\/\/)[a-z0-9%&_-+=:@~#/.?]+(\/|[a-z]))/<a href="$1" title="go to: $1">$1<\/a>/isg

This RegEx has served me well. I have used it for years, in various web applications, to convert plain-text addresses into hyperlinks. It always worked and I never had the need to change it. But today's application-generated URI's have proved to be too much for it.

Rather than construct a new RegEx myself, I did a little googling and found what appears to be the authoritive RegEx for URL recognition. Reviewing the documentation, it seems this RegEx will identify every legally-encoded URI possible. But just look at the thing! Surely there's something less imposing?

I then searched CPAN and found the "URI::Find" module. I quote, "This module does one thing: Finds URIs and URLs in plain text. It finds them quickly and it finds them all." Perfect, I'll buy it!

I installed URI::Find and loaded Shapeshifter's parser into my editor. However, I found myself struggling with the implementation until I found an example of URI::Find usage.

That article pointed the way and, with only a minor modification (to add a "title" to the hyperlink) became Shapeshifter's new URI finder:

require URI::Find;
my $finder = URI::Find->new( sub {
    my $uri    = shift;  # object representing the url
    my $string = shift;  # text that was in the url

    # return the replacement text, i.e. the same text
    # wrapped in <a href="..." title="..."> ... </a>
    return '<a href="' .
        $uri->abs. # get the absolute address
        '" title="go to: ' .
        $string.  # use the original text as the link title
        '">' .
        $string . # keep the original text
my $found = $finder->find($_[0]);

I've tested this locally with a variety of URI's from my bookmarks and it successfully identified and converted every one.

Problem Solved!

You can comment on this entry, or read what others have written (2 comments).