Search code examples
phpregexpreg-replace-callback

regex for PHP preg_replace_callback


Regular expressions is just not my thing. :(

I have a string that may contain multiple sub strings such as:

[var1="111" var2="222" var3="222" var4="444"]

I basically need to replace each occurrence with the results of a function that gets each variable.

$string = '[var1="111" var2="222" var3="333" var4="444"]';    
$regex = 'var1="(.*)" var2="(.*)" var3="(.*)" var4="(.*)"';

function doit($matches){
    return "<strong>".implode(", ", $matches) ."</strong>";
}

echo preg_replace_callback($regex,'doit',$string);

I’m expecting 111, 222, 333, 444.

But I’m not feeling the love. Any help would be appreciated!

Edit: some clarification..

I was simplifying for the sake of the question.

  1. the "vars" will not really be called "var1", "var2" more like [recID="22" template="theName" option="some value", type="list"]

  2. the call back would be more complicated then my simple implode... I would use "id" and connect to a database, get the records and insert them into the template... Hence the need for the call back...

  3. matched values could be almost anything, not just numbers.


Solution

  • I would do it in two step:

    1. Get the tags
    2. Get the attributes for each tag

    Here’s an example:

    function doit($match) {
        $parts = preg_split('/(\w+="[^"]*")/', $match[1], -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
        foreach ($parts as $part) {
            if (trim($part) !== "") {
                if (!preg_match('/^(\w+)="([^"]*)"$/', $part, $match)) {
                    // syntax error
                } else {
                    var_dump($match);
                }
            }
        }
    }
    
    $str = '[var1="111" var2="222" var3="333" var4="444"]';
    $re = '/\[((?:[^"[\]]+|"[^"]*")*)]/';
    preg_replace_callback($re, 'doit', $str);
    

    With this you can even detect syntax errors.

    It’s not a perfect solution as it would be more efficient to read the input character by character and determine the tokens depending on the contexts they are found in.