Search code examples
javaregexcapture-group

Are non-capturing groups redundant?


Are optional non-capturing groups redundant?

Is the following regex:

(?:wo)?men

semantically equivalent to the following regex?

(wo)?men

Solution

  • Your (?:wo)?men and (wo)?men are semantically equivalent, but technically are different, namely, the first is using a non-capturing and the other a capturing group. Thus, the question is why use non-capturing groups when we have capturing ones?

    Non-caprturing groups are of help sometimes.

    1. To avoid excessive number of backreferences (remember that it is sometimes difficult to use backreferences higher than 9)
    2. To avoid the problem with 99 numbered backreferences limit (by reducing the number of numbered capturing groups) (source: Regular-expressions.info: Most regex flavors support up to 99 capturing groups and double-digit backreferences.)
      NOTE this does not pertain to Java regex engine, nor to PHP or .NET regex engines.
    3. To lessen the overhead caused by storing the captures in the stack
    4. We can add more groupings to existing regex without ruining the order of capturing groups.

    Also, it is just makes our matches cleaner:

    You can use a non-capturing group to retain the organisational or grouping benefits but without the overhead of capturing.

    It does not seem a good idea to re-factor existing regular expressions to convert capturing to non-capturing groups, since it may ruin the code or require too much effort.