This document is available on the Internet at: http://urbanmainframe.com/folders/blog/20040421_a/folders/blog/20040421_a/
|
Tags:
As more and more websites are driven by applications rather than static files, links are becoming increasingly complicated. I have noticed that the REGEX I use for identifying URL's is failing with increasing freqency. So I went in search of a replacement...
I posted a link into my Urbex forum ealier today. It's a relatively simple, harmless looking URI - http://projectz.org/index.php?cat=3 - but Shapeshifter's parser failed to successfully convert the link to a hyperlinked one. I wasn't too bothered, I hoped on over to TinyURL.com and generated a shorter link: http://tinyurl.com/yvsq3. Again, the parser failed!
In the parser, a small RegEx was used to perform URL link conversion (documenting the RegEx is left as an exercise for the reader):
s/(((ht|f)tp):(\/\/)[a-z0-9%&_-+=:@~#/.?]+(\/|[a-z]))/<a href="$1" title="go to: $1">$1<\/a>/isg
This RegEx has served me well. I have used it for years, in various web applications, to convert plain-text addresses into hyperlinks. It always worked and I never had the need to change it. But today's application-generated URI's have proved to be too much for it.
Rather than construct a new RegEx myself, I did a little googling and found what appears to be the authoritive RegEx for URL recognition. Reviewing the documentation, it seems this RegEx will identify every legally-encoded URI possible. But just look at the thing! Surely there's something less imposing?
I then searched CPAN and found the "URI::Find" module. I quote, "This module does one thing: Finds URIs and URLs in plain text. It finds them quickly and it finds them all." Perfect, I'll buy it!
I installed URI::Find and loaded Shapeshifter's parser into my editor. However, I found myself struggling with the implementation until I found an example of URI::Find usage.
That article pointed the way and, with only a minor modification (to add a "title" to the hyperlink) became Shapeshifter's new URI finder:
require URI::Find;
my $finder = URI::Find->new( sub {
my $uri = shift; # object representing the url
my $string = shift; # text that was in the url
# return the replacement text, i.e. the same text
# wrapped in <a href="..." title="..."> ... </a>
return '<a href="' .
$uri->abs. # get the absolute address
'" title="go to: ' .
$string. # use the original text as the link title
'">' .
$string . # keep the original text
'</a>';
}
);
my $found = $finder->find($_[0]);
I've tested this locally with a variety of URI's from my bookmarks and it successfully identified and converted every one.
Problem Solved!