Search code examples
phparraysregexpreg-splitmathematical-expressions

Split mathematic expression into array without splitting subexpressions between parentheses and single quotes


Let's say I have this string:

1 + 2 * (3 + (23 + 53 - (132 / 5) + 5) - 1) + 2 / 'test + string' - 52

I want to split it into an array of operators and non-operators, but anything between the () and ' must not be split.

I want the output to be:

[1, "+", 2, "*", "(3 + (23 + 53 - (132 / 5) + 5) - 1)", "+", 2, "/", "'test + string'", "-", 52]

I'm using this code:

preg_split("~['\(][^'()]*['\)](*SKIP)(*F)|([+\-*/^])+~", $str, -1, PREG_SPLIT_DELIM_CAPTURE);

The technique does what I want with the operators and the ', but not for (). However it only keeps (132 / 5) (the deepest nested parenthetical expression) and splits all the other ones, giving me this output:

[1, "+", 2, "*", "(3", "+", "(23", "+", 53, "-", "(132 / 5)", "+", "5)", "-", "1)", "+", 2, "/", "'test + string'", "-", 52]

How can I ensure that the outermost parenthetical expression and all of its contents remain together?


Solution

  • You might use a pattern to recurse the first sub pattern matching balanced parenthesis and then use the SKIP FAIL. After the alternation you can still use the capture group, which will be group 2 and the values will be kept due to the PREG_SPLIT_DELIM_CAPTURE flag.

    To remove the empty entries, you can add the PREG_SPLIT_NO_EMPTY flag.

    (?:(\((?:[^()]++|(?1))*\))|'[^']*')(*SKIP)(*F)|([+\-*/^])
    

    Regex demo

    $str = "1 + 2 * (3 + (23 + 53 - (132 / 5) + 5) - 1) + 2 / 'test + string' - 52";
    $result = preg_split("~(?:(\((?:[^()]++|(?1))*\))|'[^']*')(*SKIP)(*F)|([+\-*/^])~", $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
    
    print_r($result);
    

    Output

    Array
    (
        [0] => 1 
        [1] => +
        [2] =>  2 
        [3] => *
        [4] =>  (3 + (23 + 53 - (132 / 5) + 5) - 1) 
        [5] => +
        [6] =>  2 
        [7] => /
        [8] =>  'test + string' 
        [9] => -
        [10] =>  52
    )