I'm trying to split up a string into two parts using regex. The string is formatted as follows:
text to extract<number>
I've been using (.*?)<
and <(.*?)>
which work fine but after reading into regex a little, I've just started to wonder why I need the ?
in the expressions. I've only done it like that after finding them through this site so I'm not exactly sure what the difference is.
It is the difference between greedy and non-greedy quantifiers.
Consider the input 101000000000100
.
Using 1.*1
, *
is greedy - it will match all the way to the end, and then backtrack until it can match 1
, leaving you with 1010000000001
.
.*?
is non-greedy. *
will match nothing, but then will try to match extra characters until it matches 1
, eventually matching 101
.
All quantifiers have a non-greedy mode: .*?
, .+?
, .{2,6}?
, and even .??
.
In your case, a similar pattern could be <([^>]*)>
- matching anything but a greater-than sign (strictly speaking, it matches zero or more characters other than >
in-between <
and >
).