Search code examples
phppreg-replacepreg-match

Extract string in brackets when there are other brackets embedded in quotes


I want to extract this bracketed part from a string:

[list items='["one","two"]' ok="no" b="c"]

I am using the following preg_match call:

preg_match('~\[([a-zA-Z0-9_]+)[ ]+([a-zA-Z0-9]+=[^\[]+)\]~s', $string,$match)

But I have trouble with the brackets that appear within quotes.

I have two files

theme.html

[list items=""one","[x]tw"'o"" ok="no" b="c""/]
@book
[button text="t'"extB1" name="ok"'" /]
    Asdfz " s wr aw3r '
[button text="t"'extB2" name="no"'" /]

file.php

$string=file_get_contents('theme.html');
for (;;) { 
    if (!preg_match('~\[([a-zA-Z0-9_]+)[ ]+([a-zA-Z0-9]+=[^\[]+)\]~s', $string,$match)) {
        exit;
    }
    $string=str_replace($match[0], '', $string);
    echo "<pre><br>";
    print_r($match);
    echo "<br></pre>";
}

and this is output:

<pre><br>Array
(
    [0] = [button text="textB1" name="ok"]
    [1] = button
    [2] = text="textB1" name="ok"
)
<br></pre>
<pre><br>Array
(
    [0] = [button text="textB2" name="no"]
    [1] = button
    [2] = text="textB2" name="no"
)
<br></pre>

As you can see the output does not include

[list items='["one","two"]' ok="no" b="c"]

I know the problem is caused by the embedded square brackets, but I don't know how I can correct the code to ignore them.


Solution

  • You could use this variation of your preg_match call:

    if (!preg_match('~\[(\w+)\s+(\w+=(?:\'[^\']*\'|[^\[])+?)\]~s', $string, $match))
    

    With \'[^\']*\' it detects the presence of a quote and will grab all characters until the next quote, without blocking on an opening bracket. Only if that cannot be matched, will it go for the part you had: [^\[])+. I added a ? to that, to make it non-greedy, which makes sure it will not grab a closing ].

    Note also that [a-zA-Z_] can be shortened to \w, and [ ] can be written as \s which will also allow other white-space, which I believe is OK.

    See it run on eval.in.

    Alternative: match complete lines only

    If the quotes can appear anywhere without guarantee that closing brackets appear within quotes, then the above will not work.

    Instead we could require that the match must span a complete line in the text:

    if (!preg_match('~^\s*\[(\w+)\s+(\w+=.*?)\]\s*$~sm', $string, $match))
    

    See it run on eval.in.