Search code examples
phpregexrecursionsplitdelimited

Split delimited string by commas not enclosed in potentially nested squared braces


I have a string like

"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"

I want to explode it to array

Array (
    0 => "first",
    1 => "second[,b]",
    2 => "third[a,b[1,2,3]]",
    3 => "fourth[a[1,2]]",
    4 => "sixth"
}

I tried to remove brackets:

preg_replace("/[ ( (?>[^[]]+) | (?R) )* ]/xis", 
             "",
             "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
); 

But got stuck one the next step


Solution

  • PHP's regex flavor supports recursive patterns, so something like this would work:

    $text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth";
    
    preg_match_all('/[^,\[\]]+(\[([^\[\]]|(?1))*])?/', $text, $matches);
    
    print_r($matches[0]);
    

    which will print:

    Array
    (
        [0] => first
        [1] => second[,b]
        [2] => third[a,b[1,2,3]]
        [3] => fourth[a[1,2]]
        [4] => sixth
    )

    The key here is not to split, but match.

    Whether you want to add such a cryptic regex to your code base, is up to you :)

    EDIT

    I just realized that my suggestion above will not match entries starting with [. To do that, do it like this:

    $text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth,[s,[,e,[,v,],e,],n]";
    
    preg_match_all("/
        (             # start match group 1
          [^,\[\]]    #   any char other than a comma or square bracket
          |           #   OR
          \[          #   an opening square bracket
          (           #   start match group 2
            [^\[\]]   #     any char other than a square bracket
            |         #     OR
            (?R)      #     recursively match the entire pattern
          )*          #   end match group 2, and repeat it zero or more times
          ]           #   an closing square bracket
        )+            # end match group 1, and repeat it once or more times
        /x", 
        $text, 
        $matches
    );
    
    print_r($matches[0]);
    

    which prints:

    Array
    (
        [0] => first
        [1] => second[,b]
        [2] => third[a,b[1,2,3]]
        [3] => fourth[a[1,2]]
        [4] => sixth
        [5] => [s,[,e,[,v,],e,],n]
    )