Search code examples
javaregexbackreferenceregex-lookaroundsnegative-lookbehind

Java regex error - Look-behind with group reference


I'm trying to build a regex that matches exactly two occurrences of a char in a class. This is the regex I made:

(?<!\1)([^raol1c])\1(?!\1)

As you can see, it uses negatives look-aheads and behind. But, as usual the latter does not work; java throws the well-known exception "look-behind group does not have an obvious maximum length" when it clearly has a maximum length (exactly one char).

Ideally the regex should match "hh", "jhh", "ahh", "hhj", "hha" but not "hhh".

Any ideas about how to deal with this and make a workaround?


Solution

  • Here is a workaround. It's ugly but apparently it works:

    (?<!(?=\1).)([^raol1c])\1(?!\1)
    

    Putting the backreference into a zero-length lookahead inside the lookbehind makes the lookbehind certainly of fixed length.

    Disclaimer, I did not come up with this (unfortunately): Backreferences in lookbehind

    EDIT:

    The above pattern does not rule out hhh for some reason. However, this works:

    (?<!(.)(?=\1))([^raol1c])\2(?!\2)
    

    If we create the first group inside the lookbehind then we can use this to ensure that the first character after the lookbehind is not the same as the one before it.

    Working demo.