Search code examples
regexperlurl

How can I format URLs nicely in Perl?


I have a bunch of URLs that I have to turn into links:

for my $url (@url_list) {
    say "<a href='$url'>$url</a>";
}

Is there a module for making the visible URL nicer? A bit like this:

http://www.foo.com/ → www.foo.com
http://www.foo.com/long_path → www.foo.com/lo…

I know a simple regex will probably do here, but I’m spoiled by CPAN. :)


Solution

  • The trick is figuring out how you want to pretty-print each sort of URL, so in that case you need to tell your script what to do in each case:

    use URI;
    
    while( <DATA> ) {
        chomp;
        my $uri = URI->new( $_ );
    
        my $s = $uri->scheme;
        my $rest = do {
            if( $s =~ /(?:https?|ftp)/ ) {
                $uri->host . $uri->path_query
                }
            elsif( $s eq 'mailto' ) {
                $uri->path
                }
            elsif( ! $s ) {
                $uri
                }
            };
    
        print "$uri -> $rest\n";
        }
    
    __END__
    http://www.example.com/foo/bar.html
    www.example.com/foo/bar.html
    ftp://www.example.com
    mailto:joe@example.com
    https://www.example.com/foo?a=b;c=d
    http://joe:password@www.example.com/login
    

    This produces:

    http://www.example.com/foo/bar.html -> www.example.com/foo/bar.html
    www.example.com/foo/bar.html -> www.example.com/foo/bar.html
    ftp://www.example.com -> www.example.com
    mailto:joe@example.com -> joe@example.com
    https://www.example.com/foo?a=b;c=d -> www.example.com/foo?a=b;c=d
    http://joe:password@www.example.com/login -> www.example.com/login
    

    If you want something different for a particular URL, you just need to make a branch for it and put together the parts that you want. Notice the URI also handles schemeless URIs.

    If you don't want long URI strings for your pretty printing, you might throw in something like this to cut off the string after so many characters:

    substr( $rest, 20 ) = '...' if length $rest > 20;
    

    Here's a solution with given, which is slightly cleaner, but also slightly uglier. This is the Perl 5.010 version:

    use 5.010;
    use URI;
    
    while( <DATA> ) {
        chomp;
        my $uri = URI->new( $_ );
    
        my $r;
        given( $uri->scheme ) {
            when( /(?:https?|ftp)/  ) { $r = $uri->host . $uri->path_query }
            when( 'mailto' )          { $r = $uri->path }       
            default                   { $r = $uri }
            }
    
    
        print "$uri -> $r\n";
        }
    

    It's uglier because I have to repeat that assignment to $r. Perl 5.14 is going to fix that though be letting given have a return value. Since that stable version isn't available yet, you have to use the experimental 5.13 track:

    use 5.013004;
    use URI;
    
    while( <DATA> ) {
        chomp;
        my $uri = URI->new( $_ );
    
        my $r = do {
            given( $uri->scheme ) {
                when( /(?:https?|ftp)/  ) { $uri->host . $uri->path_query }
                when( 'mailto' )          { $uri->path }        
                default                   { $uri }
                }
            };
    
        print "$uri -> $r\n";
        }