Search code examples
phpsplitpreg-split

Split string string on the nth semicolon in a string


I need help finding a PCRE pattern using preg_split().

I'm using the regex pattern below to split a string based on its starting 3 character code and semi-colons. The pattern works fine in Javascript, but now I need to use the pattern in PHP. I tried preg_split() but just getting back junk.

// Each group will begin with a three letter code, have three segments separated by a semi-colon. The string will not be terminated with a semi-colon.

// Pseudocode    
string_to_split = "AAA;RED;111;BBB;BLUE;22;CCC;GREEN;33;DDD;WHITE;44"

// This works in JS  
// https://regex101.com  
$pattern = "/[AAA|BBB|CCC|DDD][^;]*;[^;]*[;][^;]*/gi";

Match 1  
Full match  0-11    `AAA;RED;111`  
Match 2  
Full match  12-23   `BBB;BLUE;22`  
Match 3  
Full match  24-36   `CCC;GREEN;33`  
Match 4  
Full match  37-49   `DDD;WHITE;44`  

$pattern = "/[AAA|BBB|CCC|DDD][^;]*;[^;]*[;][^;]*/";  
$split = preg_split($pattern, $string_to_split);

returns

array(5)  
    0:""  
    1:";"  
    2:";"  
    3:";"  
    4:""  

Solution

  • You don't want to split your string but match elements, use preg_match_all:

    $str = "AAA;RED;111;AAA;Oh my dog;2.34;AAA;Oh Long John;.4556;BBB;Oh Long Johnson;1.2323;BBB;Oh Don Piano;.33;CCC;Why I eyes ya;1.445;CCC;All the live long day;2.3343;DDD;Faith Hilling;.89";
    $res = preg_match_all('/(?:AAA|BBB|CCC|DDD);[^;]*;[^;]*;?/', $str, $m);
    print_r($m[0]);
    

    Output:

    Array
    (
        [0] => AAA;RED;111;
        [1] => AAA;Oh my dog;2.34;
        [2] => AAA;Oh Long John;.4556;
        [3] => BBB;Oh Long Johnson;1.2323;
        [4] => BBB;Oh Don Piano;.33;
        [5] => CCC;Why I eyes ya;1.445;
        [6] => CCC;All the live long day;2.3343;
        [7] => DDD;Faith Hilling;.89
    )
    

    Explanation:

    /                       : regex delimiter
      (?:AAA|BBB|CCC|DDD)   : non capture group AAA or BBB or CCC or DDD
      ;                     : a semicolon
      [^;]*                 : 0 or more any character that is not a semicolon
      ;                     : a semicolon
      [^;]*                 : 0 or more any character that is not a semicolon
      ;?                    : optional semicolon
    /                       : regex delimiter