Search code examples
phppreg-replacebackreference

PHP - preg_replace backreferencing


I have big problems understanding how to use preg_replace with backreferencing.

I have a plain-text string and want to replace every link with the HTML syntax for a link. So "www.mydomain.tld" or "http://www.mydomain.tld" or "http://mydomain.tld" should be wrapped in an HTML a-tag. I have found a working function that does this online, but I want to understand how to do it myself.

In the function I found, this is the replacement:

"\\1<a href=\"http://\\2\" target=\"_blank\" rel=\"nofollow\">\\2</a>"

I see some escaped quotation marks in there and these bits: \\1 \\2.
According to the PHP documentation these are backreferences. But how do I use them, what do they do?

I found nothing about that in the spec, so any help would be greatly appreciated!


Solution

  • This will do the job for you. Please see below for an explanation on how it all works.

    $string = 'some text www.example.com more text http://example.com more text https://www.example.com more text';
    
    $string = preg_replace('#\b(?:http(s?)://)?((?:[a-z\d-]+\.)+[a-z]+)\b#', "<a href='http$1://$2'>http$1://$2</a>", $string);
    
    echo $string; // some text <a href='http://www.example.com'>http://www.example.com</a> more text <a href='http://example.com'>http://example.com</a> more text <a href='https://www.example.com'>https://www.example.com</a> more text
    

    \b match word boundary (?:http(s?)://)? optionally match string if it contains 'http://' or 'https://', if https grab the 's' so we can build correct URL

    (?:[a-z\d-]+\.)+ match one or more occurrence of of series of letter/numbers followed by a period

    [a-z]+ match one ore more occurrences of a series of letters, TLD, note TLDs are now open for purchase so can't limit length anymore. see http://tinyurl.com/cle6jqb

    We then capture both of the last two sections in addition to the 's' in a backreference by enclosing them in parentheses.

    We then build the URL:

    <a href='http$1://$2'>http$1://$2</a>
    

    http$1:// create HTTP if HTTPS the backreference $1 will contain an 's'

    $2 will contain the domain name. We make the link where the URL is made the link text.