Search code examples
regexfsm

How to determine if a regex is orthogonal to another regex?


I guess my question is best explained with an (simplified) example.

Regex 1:

^\d+_[a-z]+$

Regex 2:

^\d*$

Regex 1 will never match a string where regex 2 matches. So let's say that regex 1 is orthogonal to regex 2.

As many people asked what I meant by orthogonal I'll try to clarify it:

Let S1 be the (infinite) set of strings where regex 1 matches. S2 is the set of strings where regex 2 matches. Regex 2 is orthogonal to regex 1 iff the intersection of S1 and S2 is empty. The regex ^\d_a$ would be not orthogonal as the string '2_a' is in the set S1 and S2.

How can it be programmatically determined, if two regexes are orthogonal to each other?

Best case would be some library that implements a method like:

/**
 * @return True if the regex is orthogonal (i.e. "intersection is empty"), False otherwise or Null if it can't be determined
 */
public Boolean isRegexOrthogonal(Pattern regex1, Pattern regex2);

Solution

  • I finally found exactly the library that I was looking for:

    dk.brics.automaton

    Usage:

    /**
     * @return true if the two regexes will never both match a given string
     */
    public boolean isRegexOrthogonal( String regex1, String regex2 ) {
       Automaton automaton1 = new RegExp(regex1).toAutomaton();
       Automaton automaton2 = new RegExp(regex2).toAutomaton();
       return automaton1.intersection(automaton2).isEmpty();
    }
    

    It should be noted that the implementation doesn't and can't support complex RegEx features like back references. See the blog post "A Faster Java Regex Package" which introduces dk.brics.automaton.