Search code examples
javaregexcapturing-group

Clarification about regex capturing groups


Directly from this java API (ctrl + f) + "Group name":

The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.

I know how capturing groups work and how they work with backreference. However I have not got the point of the API bit I above quoted. Is somebody able to put it down in other words?

Thanks in advance.


Solution

  • That quote says that:

    If you have used a quantifier - +, *, ? or {m,n}, on your capture group, and your group is matched more than once, then only the last match will be associated with the capture group, and all the previous matches will be overridden.

    For e.g.: If you match (a)+ against the string - "aaaaaa", your capture group 1 will refer to the last a.

    Now consider the case, where you have a nested capture group as in the example shown in your quote:

    `(a(b)?)+`
    

    matching this regex with the string - "aba", you get the following 2 matches:

    • "ab" - Capture Group 1 = "ab" (due to outer parenthesis), Capture Group 2 = "b"(due to inner parenthesis)
    • "a" - Capture Group 1 = "a", Capture Group 2 = None. (This is because second capture group (b)? is optional. So, it successfully matches the last a.

    So, finally your Capture group 1 will contain "a",which overrides earlier captured group "ab", and Capture group 2 will contain "b", which is not overridden.