Search code examples
javaregexregexbuddy

Why doesn't this Java regular expression work?


I need to create a regular expression that allows a string to contain any number of:

  • alphanumeric characters
  • spaces
  • (
  • )
  • &
  • .

No other characters are permitted. I used RegexBuddy to construct the following regex, which works correctly when I test it within RegexBuddy:

\w* *\(*\)*&*\.*

Then I used RegexBuddy's "Use" feature to convert this into Java code, but it doesn't appear to work correctly using a simple test program:

public class RegexTest
{
  public static void main(String[] args)
  {
    String test = "(AT) & (T)."; // Should be valid
    System.out.println("Test string matches: "
      + test.matches("\\w* *\\(*\\)*&*\\.*")); // Outputs false
  }
}
  • I must admit that I have a bit of a blind spot when it comes to regular expressions. Can anyone explain why it doesn't work please?

Solution

  • That regular expression tests for any amount of whitespace, followed by any amount of alphanumeric characters, followed by any amount of open parens, followed by any amount of close parens, followed by any amount of ampersands, followed by any amount of periods.

    What you want is...

    test.matches("[\\w \\(\\)&\\.]*")
    

    As mentioned by mmyers, this allows the empty string. If you do not want to allow the empty string...

    test.matches("[\\w \\(\\)&\\.]+")
    

    Though that will also allow a string that is only spaces, or only periods, etc.. If you want to ensure at least one alpha-numeric character...

    test.matches("[\\w \\(\\)&\\.]*\\w+[\\w \\(\\)&\\.]*")
    

    So you understand what the regular expression is saying... anything within the square brackets ("[]") indicates a set of characters. So, where "a*" means 0 or more a's, [abc]* means 0 or more characters, all of which being a's, b's, or c's.