I have some data that will be one of the following
Word Number Word Number Word Number Word Word Number Word Word Number Word Number Word Word Number Word Word Number
I would like to extract the Word(s) up until the numbers, and the numbers. Here is what I have at the moment (which looks OK to me, but I don't fully understand regex).
preg_match('/([A-Za-z ])([0-9])([A-Za-z ])([0-9])/', $game, $info); print_r($info);
However, the array is empty. I know I've seen ^
and +
and $
used before but I'm not quite sure how to work it into the regex.
In order to match the strings with the format you described, you need
preg_match_all('/^([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)\s+([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)$/im', $game, $info);
See the regex demo
$re = '~^([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)\s+([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)$~im';
$game = "Word 123 Word 456\nWord 1234 Word Word 3456\nWord Word 3455 Word 4566\nWord Word 4434 Word Word 44332";
preg_match_all($re, $game, $info);
print_r($info);
The regex explanation:
^
- start of string([a-z]+(?:\s+[a-z]+)?)
- Group 1 for Word Word
or Word
pattern\s+
- one or more whitespaces([0-9]+)
- Group 2 for Number
\s+
- one or more whitespaces([a-z]+(?:\s+[a-z]+)?)
- Group 3 for Word Word
or Word
pattern\s+
- one or more whitespaces([0-9]+)
- Group 4 for Number
pattern $
- end of stringThe /i
modifier makes the pattern case-insensitive. /m
modifier is used for testing only (it makes ^
and $
match start and end of a line, not the whole string).
The [a-z]+(?:\s+[a-z]+)?
subpattern means *match one or more letters with [a-z]+
and then match one or zero occurrence of a sequence of one or more whitespaces (\s+
) followed with one or more letters ([a-z]+
). Thus, this pattern effectively matches 1 or 2 words separated with a whitespace.