Why does this regex:
>>> r = re.compile("[0-9]*", re.DEBUG)
match like this:
>>> m = r.search("abc")
>>> m.group()
''
I was hoping that it would match the entire string 'abc'
since 'a'
fulfills the condition viz match 0 digits, and then the greedy match would include the string 'abc'
in its entirety.
You asked "find me zero or more digits", so it found you zero or more digits (zero; empty string).
If you wanted "find me zero or more digits followed by zero or more other characters", you need to say that (with the .*
pattern). '[0-9]*'
does not match 'abc'
, because 'abc'
includes characters (letters) not included in the requested expression.
>>> r = re.compile('[0-9]*.*') # Note the very important ".*" that matches everything!
>>> r.search('abc').group()
'abc'
The point is the word "match". If your expression does not contain [a representation of] a certain character (such as "a"), then it cannot possibly match a string that contains that character! Your given expression matches only strings consisting of zero or more digits and nothing else. Therefore it clearly doesn't match 'abc'
.
As Tigerhawk has mentioned in the comments, if the *
in regular expressions meant "zero or more of the preceding pattern, or anything else", it would be extraordinarily useless, as any pattern with a *
in it would match all strings, all the time!