Search code examples
javaregexword-boundary

JAVA REGEX :: Could you explain this?


My pattern is [a-z][\\*\\+\\-_\\.\\,\\|\\s]?\\b

My Result:

a__
not matched
a_.
pattern matched = a_
a._
pattern matched = a.
a..
pattern matched = a

why my first input is alone not matched??? Thanks in advance.

[ PS: got the same result with [a-z][\\*\\+\\-\\_\\.\\,\\|\\s]?\\b ]


Solution

  • Because unlike the period ., the underscore _ is considered to be a word character; so a_ is one word, but a. is a word with interpunction.

    So, a__ matches a, then matches _, then fails to match a word boundary (since the next _ is a part of the same word).

    a.. matches a, skips the character range, then matches the word boundary between the word a and the interpunction ..