Search code examples
regexregular-languagecontext-free-language

Can regex be used to recognize any Context Free Language?


I know that regex packages can recognize a wider set of languages than just Regular Languages, but the use of recursive regex in Python regex to find arithmetic expressions in text strings makes me wonder if it is possible to recognize any Context Free Language using regex, and if not, could someone provide a counter example?


Solution

  • Basically this answer is taken from this great blog post.

    So the short answer is that regular expressions with a recursive extension can recognize any context free grammar.

    To show this the idea is to show a way which constructs a regex from a context free grammar.

    (?<name> ...) defines a regex pattern which can later be reused with (?&name).

    Any context free grammar can be written as a set of rules of the following forms:

    • A -> BC
    • A -> a

    If we can write these rules as regex the regex can recognize any context free language. The only interesting rule here is the first one.

    Firstly, if the rule is left-recursive we need to rewrite it to a right-recursive rule since regex only support right recursion. This rewrite is always possible. Now we can write all such rules as follows:

    A -> BC
    A -> DE
    (?<A>(?&B)(?&C)|(?&D)(?&E))
    

    This allows the definition of arbitrary CFG rules, so we only need to define them all and then match the initial rule.

    (?(DEFINE)define rules here)^(?&initial)$
    

    Here the (?(DEFINE)...) declares rules without matching and the initial refers to the initial rule of the grammar.

    It has been some time since I heard theoretical CS courses so please correct me if there are mistakes :)