pythonregex

Python group(0) meaning


What is the exact definition of group(0) in re.search?

Sometimes the search can get complex. What is the supposed group(0) value by definition?

Just to give an example of where the confusion comes, consider this matching. The printed result is only def. So in this case group(0) didn't return the entire match.

m = re.search('(?<=abc)def', 'abcdef')
>>> m.group(0)

Output:

def

Solution

  • match_object.group(0) says that the whole part of match_object is chosen.

    In addition, group(0) can be explained by comparing it with group(1), group(2), group(3), ..., group(n). Group(0) locates the whole match expression. Then to determine more matching locations, parentheses are used: group(1) means the first parenthesis pair locates matching expression 1, group(2) says the second next parenthesis pair locates the match expression 2, and so on. In each case the opening bracket determines the next parenthesis pair by using the furthest closing bracket to form a parenthesis pair. This probably sounds confusing, that's why there is an example below.

    But you need to differentiate between the syntax of the parentheses of '(?<=abc)'. These parentheses have a different syntactical meaning, which is to locate what is bound by '?<='. So your main problem is that you don't know what '?<=' does. This is a so-called look-behind which means that it matches the part behind the expression that it bounds.

    In the following example, 'abc' is bound by the look-behind.

    No parentheses are needed to form match group 0 since it locates the whole match object anyway.

    The opening bracket in front of the letter 'd' takes the last closing bracket in front of the letter 'f' to form matching group 1.

    The brackets that are around the letter 'e' define matching group 2.

    import re
    
    m = re.search('(?<=abc)(d(e))f', 'abcdef')
    
    print(m.group(0))
    print(m.group(1))
    print(m.group(2))
    

    This prints:

    def
    de
    e