Search code examples
javaregexstring-matchingregex-negation

Combining or and negation in Java regex?


I am trying to combine the usage of "not" and "or" to produce a set of regex matching like so:

"blah" matching "zero or more of" : "not h"         or  "any in b,l,a" = false 
"blah" matching "zero or more of" : "any in b,l,a"  or  "not h"        = false  
"blah" matching "zero or more of" : "not n"         or  "any in b,l,a" = true  
"blah" matching "zero or more of" : "any in b,l,a"  or  "not n"        = true  

I have tried the following regular expressions, but they don't seem to achieve what I am looking for. I've also included my interpretation of the regexes:

//first set attempt - turns out to be any of the characters within?
System.out.println("blah".matches("[bla|^h]*"));    //true
System.out.println("blah".matches("[^h|bla]*"));    //false
System.out.println("blah".matches("[bla|^n]*"));    //false
System.out.println("blah".matches("[^n|bla]*"));    //false
//second set attempt - turns out to be the literal text
System.out.println("blah".matches("(bla|^h)*"));    //false
System.out.println("blah".matches("(^h|bla)*"));    //false
System.out.println("blah".matches("(bla|^n)*"));    //false
System.out.println("blah".matches("(^n|bla)*"));    //false
//third set attempt - almost gives the right results, but it's still off somehow
System.out.println("blah".matches("[bla]|[^h]*"));  //false
System.out.println("blah".matches("[^h]|[bla]*"));  //false
System.out.println("blah".matches("[bla]|[^n]*"));  //true
System.out.println("blah".matches("[^n]|[bla]*"));  //false

So, in the end, I am wondering about the following:

  1. Are my interpretations of the above regexes correct?
  2. What is a group of four Java regex that match my specification?
  3. (Optional) Am I making other mistakes in my regex?

With regards to fuzzy requirements, I'd just like to make the following point:
The regex subdivisions could have been something like ("not [abc]" or "bc")* which would match any string that is like bcbc... or ... where characters are not as, bs, or cs. I just chose "blah" as a general example, like "foo" or "bar".


Solution

  • For first 2 conditions you may use:

    ^(?:[bla]|[^h])*$
    

    And next 2 you may use:

    ^(?:[bla]|[^n])*$
    

    RegEx Details:

    • ^: Start
    • (?:: Start non-capture group
      • [bla]: Match one of b or l or a:
      • |: OR
      • [^h]: Match any char that is not h
    • )*: End non-capture group, match 0 or more of this group
    • $: End RegEx Demo

    Note that for .matches, anchors are implicit so you can omit ^ and $.