I guess my question is best explained with an (simplified) example.
Regex 1:
^\d+_[a-z]+$
Regex 2:
^\d*$
Regex 1 will never match a string where regex 2 matches. So let's say that regex 1 is orthogonal to regex 2.
As many people asked what I meant by orthogonal I'll try to clarify it:
Let S1 be the (infinite) set of strings where regex 1 matches. S2 is the set of strings where regex 2 matches. Regex 2 is orthogonal to regex 1 iff the intersection of S1 and S2 is empty. The regex ^\d_a$ would be not orthogonal as the string '2_a' is in the set S1 and S2.
How can it be programmatically determined, if two regexes are orthogonal to each other?
Best case would be some library that implements a method like:
/**
* @return True if the regex is orthogonal (i.e. "intersection is empty"), False otherwise or Null if it can't be determined
*/
public Boolean isRegexOrthogonal(Pattern regex1, Pattern regex2);
I finally found exactly the library that I was looking for:
Usage:
/**
* @return true if the two regexes will never both match a given string
*/
public boolean isRegexOrthogonal( String regex1, String regex2 ) {
Automaton automaton1 = new RegExp(regex1).toAutomaton();
Automaton automaton2 = new RegExp(regex2).toAutomaton();
return automaton1.intersection(automaton2).isEmpty();
}
It should be noted that the implementation doesn't and can't support complex RegEx features like back references. See the blog post "A Faster Java Regex Package" which introduces dk.brics.automaton.