Search code examples
phpregexpreg-match-alltext-extraction

Extract multiple numbers from a string after a specific substring


I have the following string:

H: 290​‐​314 P: 280​‐​301+330​​​​U+200B+331​string‐​305+351+338​‐​308+310 [2]

I need all the numbers after P:: [280,301,330,331,305,351,338,308,310].

Note that there is this U+200B which is a char-code and should be ignored.

I tried #P:\s((\d+)[​\‐]+)+# but that doesn't work.


Solution

  • You can use

    (?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)[^\d\s]*\K(?!200B)\d+
    

    See the regex demo.

    Details:

    • (?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*) - either the end of the previous successful match and then any zero or more chars other than digits/whitespace and 200B, or P: and zero or more horizontal whitespaces
    • [^\d\s]* - zero or more chars other than digits and whitespace
    • \K - match reset operator that discards the text matched so far from the overall match memory buffer
    • (?!200B)\d+ - one or more digits that are not starting the 200B char sequence.

    See the PHP demo:

    $text = 'H: 290‐314 P: 280‐301+330U+200B+331string‐305+351+338‐308+310 [2]';
    if (preg_match_all('~(?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)[^\d\s]*\K(?!200B)\d+~', $text, $matches)) {
        print_r($matches[0]);
    }
    

    Output:

    Array
    (
        [0] => 280
        [1] => 301
        [2] => 330
        [3] => 331
        [4] => 305
        [5] => 351
        [6] => 338
        [7] => 308
        [8] => 310
    )