Search code examples
phpregexpreg-split

Split string by newlines


This question is very similar to use preg_split instead of split but I've got some confusions with the regex that I'd live to clear up.

Trying to update some existing split() functions to use preg_split() instead and I'm getting some unclear results. Running the code below will give me arrays of different lengths and I'm not sure why.

From what I can see split is matching on \n with a possible \r beforehand. And I think preg_split() is doing the same but then why is it creating 2 splits? Is this to do with lazy/greedy matching?

Demo code :

$test = "\r\n";

$val = split('\r?\n', $test); //literal interpretation of string
$val_new = split("\r?\n", $test); //php understanding that these are EOL chars
$val2 = preg_split('/\r?\n/', $test);

var_dump($val); // returns array(1) { [0]=> string(2) " " }
var_dump($val2); // returns array(2) { [0]=> string(0) "" [1]=> string(0) "" }

Edit : added in $val_new based on Kolinks comments because they helped clear up my understanding of the problem so may be of use to another too


Solution

  • split does not understand \r and \n as special characters, and because you used single quotes PHP doesn't treat them as special characters either. So split is looking for literal \\n or \r\n.

    preg_split, on the other hand, does understand \r and \n as special characters, so even though PHP doesn't treat them as such PCRE does and the string is therefore split correctly.

    This has nothing to do with lazy/greedy matching, it's all because of the single quotes not parsing \r\n into their newline meanings.