Search code examples
phpregexpreg-replace-callback

How to recognize tokens between repeated delimiters?


I am trying to parse templates where tokens are delimited by @ on both sides.

Example input:

Hello, @name@! Please contact admin@example.com, dear @name@!

Desired output:

Hello, Peter! Please contact admin@example.com, dear Peter!

Naive attempt to find matches and replace:

$content = 'Hello, @name@! Please contact admin@example.com, dear @name@!';

preg_replace_callback(
    '/(@.*@)/U', function ($token) {
        if ('@name@' == $token)  //replace recognized tokens with values
            return 'Peter';

        return $token;  //ignore the rest
    }, $content);

This regex doesn't correctly deal with spare @ - it matches first @name@ and @example.com, dear @ and fails to match the second @name, because an @ is already spent before. The output is:

Hello, Peter! Please contact admin@example.com, dear @name@!

To prevent spending @, I tried using lookarounds:

$content = 'Hello, @name@! Please contact admin@example.com, dear @name@!';

preg_replace_callback(
    '/(?<=@)(.*)(?=@)/U', function ($token) {
        if ('name' == $token)  //replace recognized tokens with values
            return 'Peter';

        return $token;  //ignore the rest
    }, $content);

This correctly matches every substring that's included between a pair of @s, but it doesn't allow me to replace the delimiters themselves. The output is:

Hello, @Peter@! Please contact admin@example.com, dear @Peter@!

How can I pass to callback anything between a pair of @s and replace it replacing the @s as well?

The tokens will not include newlines or @.

Another example

This is a bit artificial, but to show what I would like to do as the current suggestions rely on word boundaries.

For input

Dog@Cat@Donkey@Zebra

I would like the calback to get Cat to see if @Cat@ should be replaced with the token value and then receive Donkey to see if @Donkey@ to be replaced.


Solution

  • Because of the possibly overlapping delimiters, I'm not sure this can be done with regexes. However here is a recursive function which will do the job. This code doesn't care what the token looks like (i.e. it doesn't have to be alphanumeric), just so long as it occurs between @ symbols:

    function replace_tokens($tokens, $string) {
        $parts = explode('@', $string, 3);
        if (count($parts) < 3) {
            // none or only one '@' so can't be any tokens to replace
            return implode('@', $parts);
        }
        elseif (in_array($parts[1], array_keys($tokens))) {
            // matching token, replace
            return $parts[0] . $tokens[$parts[1]] . replace_tokens($tokens, $parts[2]);
        }
        else {
            // not a matching token, try further along...
            // need to replace the `@` symbols that were removed by explode
            return $parts[0] . '@' . $parts[1] . replace_tokens($tokens, '@' . $parts[2]);
        }
    }
    
    $tokens = array('name' => 'John', 'Cat' => 'Goldfish', 'xy zw' => '45');
    echo replace_tokens($tokens, "Hello, @name@! Please contact admin@example.com, dear @name@!") . "\n";
    echo replace_tokens($tokens, "Dog@Cat@Donkey@Zebra") . "\n";
    echo replace_tokens($tokens, "auhdg@xy zw@axy@Cat@") . "\n";
    $tokens = array('Donkey' => 'Goldfish');
    echo replace_tokens($tokens, "Dog@Cat@Donkey@Zebra") . "\n";
    

    Output:

    Hello, John! Please contact admin@example.com, dear John!
    DogGoldfishDonkey@Zebra
    auhdg45axyGoldfish
    Dog@CatGoldfishZebra