Search code examples
phpregexmatch

Split string on dots unless the dot is inside double quotes


I hate regular expressions, but for what I am doing I'm sure there is no other simpler option. Anyway I have been working with this experssion:

/(([a-zA-z_]+)[\.]?+)+/

To try and match something similar to this

"text.lol".something.another etc..

And with preg_match return an array similar to

Array
(
    [0] => "text.lol"
    [1] => something
    [2] => another
)

But instead all I am getting is the first matched item twice in the array?


Solution

  • This gives the output you want for the input you specified:

    $s = '"text.lol".something.another';
    preg_match_all('/"[^"]+"|[^.]+/', $s, $m);
    $values = $m[0];
    print_r($values);
    

    Here's a full implementation that allows escaped quotes:

    function encode($original) {
        foreach ($original as &$s) {
            $s = addslashes($s);
            if (strpos($s, '.') !== false) $s = '"'.$s.'"';
        }
        return join('.', $original);
    }
    
    function decode($s) {
        // These regular expressions courtesy of ridgerunner:
        preg_match_all('/"([^"\\\\]*+(?:\\\\.[^"\\\\]*)*)"|([^.]+)/', $s, $m);
        // This one has poorer performance, but is easier to read:
        // preg_match_all('/"((?:\\\\.|[^"\\\\])+)"|([^.]+)/', $s, $m);
        $values = array();
        foreach ($m[1] as $k => $v) $values[] = stripslashes($v? $v : $m[2][$k]);
        return $values;
    }
    
    $test_cases = array('a.b', 'a\\', '.a\\', 'a.b"c', '"a');
    $encoded = encode($test_cases);
    $decoded = decode($encoded);
    
    echo '<pre>Encoded: '.$encoded."\n";
    echo print_r($decoded, 1).'</pre>';
    

    Output:

    Encoded: "a.b".a\\.".a\\"."a.b\"c".\"a
    Array
    (
        [0] => a.b
        [1] => a\
        [2] => .a\
        [3] => a.b"c
        [4] => "a
    )