Search code examples
javaregexregex-lookarounds

Can I put an assertion in a character class in regex?


I want to write a Regex that matches strings that contain letters, spaces and numbers followed by dots. The numbers can appear anywhere in the string and there can be multiple. For example:

Foo            -> Match
Foo Bar        -> Match
Foo 1 Bar      -> No Match
Foo 1. Bar     -> Match
Foo 11. Bar    -> Match
1. Foo 11. Bar -> Match

I know that I can match letters and spaces with [a-zA-Z ]+ and numbers followed by a dot with \d+(?=\.). But when I insert the latter into the former it matches all numbers, as well as the literal characters '+', '(', '?', '=', '.' and ')'.

Is there a way to achieve this?


Solution

  • Character classes are meant to match characters, you cannot put zero-width assertions into character classes. Notice how \b loses its word boundary meaning inside a character class and starts matching backspace characters (in most regex flavors at least).

    In this case, you can match zero or more occurrences of either one or more digits with a dot right after or the characters you allow:

    ^(?:\d+\.|[a-zA-Z ])*$
    

    See the regex demo. If an empty string is not allowed, replace * with + before the $ anchor.

    Details:

    • ^ - start of string
    • (?:\d+\.|[a-zA-Z ])* - zero or more repetitions of
      • \d+\. - one or more digits and then a .
      • [a-zA-Z ] - an ASCII letter or a regular space
    • $ - end of string.

    In Java code, no need to use ^ and $ with String.matches():

    text.matches("(?:\\d+\\.|[a-zA-Z ])*")