Search code examples
javaregexregex-lookaroundsregex-group

Java Regex missing a match in the output


I am currently matching a string against a regular expression. My pattern is:

"(?<=\p{Alnum}|\p{Punct})(\p{Alnum}+\p{Punct}{1})"

I am matching it with the string:

"https://www.google.com/"

My desired result with the above regex and string is:

https:, www., google., com/

I am able to get all the matches successfully except 'https:' one. In that case it is giving out 'ttps:' instead of the required 'https:'

I am not able to understand where I went wrong. Can anyone please help me in figuring this out?


Solution

  • You can use

    (?<![^\p{Alnum}\p{Punct}])(\p{Alnum}+\p{Punct})
    

    See the online regex demo.

    The (?<![^\p{Alnum}\p{Punct}]) negative lookbehind matches a location that is not immediately preceded by a char other than an alphanumeric and a punctuation char.

    Note that your regex required an alphanumeric or punctuation char immediately on the left, so it was impossible to match the start of string position.

    Note that {1} is always redundant, you can see more about regex redundancy in the "Writing cleaner regular expressions" YT video of mine.