Search code examples
phpregexwordpresspreg-matchpreg-split

Regular Expressions (Specifically preg_split() PHP)


I'm listing out some dates in my PHP application which results in something like the following:

April2016May2016June2016 etc.

I'm trying to use preg_split to format them like this:

array('April 2016', 'May 2016', 'June 2016')

I used an online Regular Expression editor to determine how to detect for 4 consecutive numbers and here's how far I've gotten:

Note: I am also removing all white space - ideally this would be better if it only removed white space if there were more than 2 spaces i.e. hello world would not be altered but hello world would.

preg_split('/\d\d\d\d/g', preg_replace('!\s+!', '', $sidebar_contents));

Using the above, I get an error suggesting that the g identifier is not valid assuming because it's not preg_match_all - removing the g result in the following:

enter image description here

Thanks for any help!


Solution

  • Here is a way to achieve what you want with 1 call to preg_match_all and using an array_map after:

    preg_match_all('~(\p{L}+)(\d+)~', "April2016May2016June2016", $m);
    $result = array_map(function($k, $v) { return $k . " " . $v; }, $m[1], $m[2]);
    print_r($result);
    

    See the regex demo and an IDEONE demo.

    The pattern means:

    • (\p{L}+) - match and capture into Group 1 (will be accessible after matching via $m[1]) one or more letters
    • (\d+) - match and capture into Group 2 (will be accessible after matching via $m[2]) one or more digits.

    With array_map, we just join the values from Group 1 and 2 with a space.

    Alternative: fill out the resulting array inside a preg_replace_callback (just one pass!):

    $result = array();
    preg_replace_callback('~(\p{L}+)(\d+)~', function($m) use (&$result) {
        array_push($result, $m[1] . " " . $m[2]);
    }, "April2016May2016June2016");
    print_r($result);
    

    See the IDEONE demo.