Search code examples
phppreg-replaceurlencodephp-5.6preg-replace-callback

php urlencode produces wrong output after rewriting


Currently on php5.6. It seems to be impossible to urlencode a match with preg_replace.

$message = preg_replace('#(https?:\/\/www.domain.nl)(.*)#si', 'https://www.affiliatedomain.com/cread.php?id=1234&affid=12345&clickref=me&p=$1$2', $message, 1);

So I tried with preg_replace_callback.

$message = preg_replace_callback('#(https?:\/\/www.domain.nl)(.*)#Usi', function($matches) { return 'https://www.affiliatedomain.com/cread.php?id=1234&affid=12345&clickref=me&p='.urlencode('[['.$matches[0].']]'); }, $message, 1);

This works partially, tried $matches[1].$matches[2] instead of $matches[0] as well.

I assume:

$matches[0] = everything matched
$matches[1] = https://www.domain.nl
$matches[2] = /internet

When I try to replace https://www.domain.nl/internet I want this to become the output:

https://www.affiliatedomain.com/cread.php?id=1234&affid=12345&clickref=me&p=%5B%5Bhttps%3A%2F%2Fwww.domain.nl%2Finternet%5D%5D

But instead I get:

https://www.affiliatedomain.com/cread.php?id=1234&affid=12345&clickref=me&p=%5B%5Bhttps%3A%2F%2Fwww.domain.nl%2F%5D%5Dinternet

No matter what I tried I can't figure it out. Researching several simular threads here, but to no avail. So lets hope the experts have a solution.


Solution

  • You may use

    '~\shref=[\'"]\Khttps?://www\.domain\.nl(?:/[^\s"\'<>]*)?~i'
    

    See the regex demo

    Details

    • \s - a whitespace
    • href= - href=
    • ['"] - a ' or "
    • \K - match reset operator discarding alltext matched so far
    • https?://www\.domain\.nl - https://www.domain.nl or http://www.domain.nl
    • (?:/[^\s"\'<>]*)? - an optional sequence:
      • / - a / char
      • [^\s"\'<>]* - 0 or more chars other than whitespace, ", ', <, >

    See the PHP demo:

    $message = '<a href="https://www.domain.nl/internet" target="_blank" title="https://www.domain.nl/internet">https://www.domain.nl/internet</a>';
    
    $message = preg_replace_callback('~\shref=[\'"]\Khttps?://www\.domain\.nl(?:/[^\s"\'<>]*)?~i', function($matches) { 
        return 'https://www.affiliatedomain.com/cread.php?id=1234&amp;affid=12345&amp;clickref=me&amp;p=' . urlencode('[[' . $matches[0] . ']]'); 
    }, $message);
    echo $message; // => <a href="https://www.affiliatedomain.com/cread.php?id=1234&amp;affid=12345&amp;clickref=me&amp;p=%5B%5Bhttps%3A%2F%2Fwww.domain.nl%2Finternet%5D%5D" target="_blank" title="https://www.domain.nl/internet">https://www.domain.nl/internet</a>
    

    You may specify to replace the first occurrence using 1 limit argument to preg_replace_callback:

    $message = preg_replace_callback('~https?://www\.domain\.nl(?:/[^\s"\'<>]*)?~i', function($matches) { 
        return 'https://www.affiliatedomain.com/cread.php?id=1234&amp;affid=12345&amp;clickref=me&amp;p=' . urlencode('[[' . $matches[0] . ']]'); 
    }, $message, 1);