Search code examples
regexregular-languagecomputation-theorydfa

Why does Regexp have a timeout method, while in theory they shouldn't?


This is a theoretical Computer Science question (Computation Theory).

I know that RegExps can take a very long time to calculate. However, from Theory of Computation we know that matching with a Regular Expression can be done extremely fast in a few clock cycles.

If RegExps are equivalent to Finite Automata, why RegExps have (or require) a timeout method? Using a DFA, the computation time for matching can be exteremely fast.

By RegExps I mean the Regular Expressions pattern matching classes in major languages; JavaScript, C#, etc.

Are common RegExps ("regex"s) not equivalent to the Regular Expressions in Theory of Automata (i.e. Regular Languages)?

For examples see: How do I timeout Regex operations to prevent hanging in .NET 4.5? and Regex Pattern Catastrophic backtracking .

If Regexp's matching require Backtracking, it means they are not equivalent to Regular Expressions.

If the languages captured by "Regexp"s are not Regular Languages, historically why (out of which necessity) were they extended?

If it that the resulting DFA will require a huge set of states?


Solution

  • A good reason is catastrophic backtracking, which explains why matching of some regexes will not return before the heat death of the universe.