Search code examples
regexregex-greedynon-greedy

What is the difference between .*? and .* regular expressions?


I'm trying to split up a string into two parts using regex. The string is formatted as follows:

text to extract<number>

I've been using (.*?)< and <(.*?)> which work fine but after reading into regex a little, I've just started to wonder why I need the ? in the expressions. I've only done it like that after finding them through this site so I'm not exactly sure what the difference is.


Solution

  • It is the difference between greedy and non-greedy quantifiers.

    Consider the input 101000000000100.

    Using 1.*1, * is greedy - it will match all the way to the end, and then backtrack until it can match 1, leaving you with 1010000000001.
    .*? is non-greedy. * will match nothing, but then will try to match extra characters until it matches 1, eventually matching 101.

    All quantifiers have a non-greedy mode: .*?, .+?, .{2,6}?, and even .??.

    In your case, a similar pattern could be <([^>]*)> - matching anything but a greater-than sign (strictly speaking, it matches zero or more characters other than > in-between < and >).

    See Quantifier Cheat Sheet.