Search code examples
complexity-theoryregular-language

Polynomial time algorithm for computing the size of the DFA describing the intersection of two regular expressions?


The DFA describing the intersection of two regular expressions can be exponentially large compared to the DFAs of the regular expressions themselves. (Here's a nice Python library for computing it.) Is there a way to compute the size of the DFA for the intersection without needing exponential resources?


Solution

  • From Wikipedia:

    Universality: is LA = Σ* ? […] For regular expressions, the universality problem is NP-complete already for a singleton alphabet.

    If I'm reading that right, it says that the problem of determining whether a regular expression generates all strings is known to be NP-complete.

    Now, for your problem: consider the case where the two input regular expressions are known to generate the same regular language (perhaps the expressions are identical). Then your problem reduces to this: what is the size of the DFA for this RE? It is relatively straightforward to tell whether a RE generates at least some strings (i.e., whether the language is empty). If the language is not empty, then the minimal DFA corresponding to the RE has one state if and only if the RE generates all strings.

    So, if your problem had a general polynomial-time solution, you'd be able to solve universality for regular expressions, which Wikipedia says is not possible.

    (If you're not asking about minimal DFAs, but the DFAs produced by a specific minimization technique, I think you'd have to specify the minimization technique).