I have the following string:
H: 290‐314 P: 280‐301+330U+200B+331string‐305+351+338‐308+310 [2]
I need all the numbers after P:
: [280,301,330,331,305,351,338,308,310]
.
Note that there is this U+200B
which is a char-code and should be ignored.
I tried #P:\s((\d+)[\‐]+)+#
but that doesn't work.
You can use
(?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)[^\d\s]*\K(?!200B)\d+
See the regex demo.
Details:
(?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)
- either the end of the previous successful match and then any zero or more chars other than digits/whitespace and 200B
, or P:
and zero or more horizontal whitespaces[^\d\s]*
- zero or more chars other than digits and whitespace\K
- match reset operator that discards the text matched so far from the overall match memory buffer(?!200B)\d+
- one or more digits that are not starting the 200B
char sequence.See the PHP demo:
$text = 'H: 290‐314 P: 280‐301+330U+200B+331string‐305+351+338‐308+310 [2]';
if (preg_match_all('~(?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)[^\d\s]*\K(?!200B)\d+~', $text, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 280
[1] => 301
[2] => 330
[3] => 331
[4] => 305
[5] => 351
[6] => 338
[7] => 308
[8] => 310
)