Search code examples
phpregexpreg-split

preg_split by space and tab outside quotes


I am trying to get preg_split() to split the following 2 strings, by space/tab (needs to work on both).

autodiscover.microsoft.com. 3600 IN A   131.107.125.5

and

microsoft.com.      3600    IN  TXT "v=spf1 include:_spf-a.microsoft.com include:_spf-b.microsoft.com include:_spf-c.microsoft.com -all"

The trick is that in the second instance the last part with quotes, should not be split.

From looking on the StackOverflow, I have found that I probably need to use this.

$results = preg_split("/'[^']*'(*SKIP)(*F)|\x20/", $str);

Sadly I cannot get it to work. I have tried several things like this for instance, but nothing works.

"\s+"(*SKIP)(*F)|\x20

Thanks in advance.


Solution

  • Just split your input according to the below regex. \h+ matches one or more horizontal space characters ie, spaces , tabs.

    (?:'[^']*'|"[^"]*")(*SKIP)(*F)|\h+
    

    (?:'[^']*'|"[^"]*") matches all the single and double quotes strings. (*SKIP)(*F) causes the match to fail and picks up all the characters which are matched by the pattern present just after to |. In our case, it's \h+ which matches one or more horizontal spaces.

    DEMO

    $str = 'microsoft.com.      3600    IN  TXT "v=spf1 include:_spf-a.microsoft.com include:_spf-b.microsoft.com include:_spf-c.microsoft.com -all"';
    $match =  preg_split('~(?:\'[^\']*\'|"[^"]*")(*SKIP)(*F)|\h+~', $str);
    print_r($match);
    

    Output:

    Array
    (
        [0] => microsoft.com.
        [1] => 3600
        [2] => IN
        [3] => TXT
        [4] => "v=spf1 include:_spf-a.microsoft.com include:_spf-b.microsoft.com include:_spf-c.microsoft.com -all"
    )