Search code examples
pythonregexpython-2.7grepcharacter-class

Why doesn't '[0-9]*' match 'abc' in my Python regular expression since there are zero or more digits in the string?


Why does this regex:

>>> r = re.compile("[0-9]*", re.DEBUG)

match like this:

>>> m = r.search("abc")
>>> m.group()
''

I was hoping that it would match the entire string 'abc' since 'a' fulfills the condition viz match 0 digits, and then the greedy match would include the string 'abc' in its entirety.


Solution

  • You asked "find me zero or more digits", so it found you zero or more digits (zero; empty string).

    If you wanted "find me zero or more digits followed by zero or more other characters", you need to say that (with the .* pattern). '[0-9]*' does not match 'abc', because 'abc' includes characters (letters) not included in the requested expression.

    >>> r = re.compile('[0-9]*.*')  # Note the very important ".*" that matches everything!
    >>> r.search('abc').group()
    'abc'
    

    The point is the word "match". If your expression does not contain [a representation of] a certain character (such as "a"), then it cannot possibly match a string that contains that character! Your given expression matches only strings consisting of zero or more digits and nothing else. Therefore it clearly doesn't match 'abc'.


    As Tigerhawk has mentioned in the comments, if the * in regular expressions meant "zero or more of the preceding pattern, or anything else", it would be extraordinarily useless, as any pattern with a * in it would match all strings, all the time!