Search code examples
theorycontext-free-grammardeterministiclr-grammarchomsky-hierarchy

Chomsky Hierarchy: LR(k) grammars vs Deterministic CFGs?


We are learning the chomsky hiearchy in my introduction to computer science course. My professor has mention lrk grammars multiple times, but they're not taught in the book. From my understanding, they are a subset of deterministic context free grammars that generate unambiguous languages. But how are they different from deterministic CFGs?

Here is the Chomsky hierarchy we've gone over in class with the devices that recognize the associated grammar:

recursively enumerable - all turing machines
recursive - deciders/TMs that halt on every input
context sensitive - Linear-bounded non-deterministic Turing machine
context free - nondeterministic PDA
deterministic context free - deterministic PDA 
LRK grammar - deterministic PDA
regular - DFAs/NFAs

On a separate note (please kindly let me know in comments if this question should be separate post) - how are linear-bounded non-deterministic Turing machine different from deciders?


Solution

  • What's tricky here is that there are two parallel hierarchies that are related but not exactly the same. There are the LR(k) grammars, which are classes of grammars with certain properties. We know that

    LR(0) ⊊ LR(1) ⊊ LR(2) ⊊ ...

    That is, as you increase k, larger and larger classes of grammars are included in the class LR(k).

    Independently, there are the LR(k) languages, which are languages for which an LR(k) grammar exists for some choice of k. There's a cool theorem from Don Knuth that shows that a language has an LR(k) grammar for some k if and only if it has an LR(1) grammar. So in that sense, the LR(k) languages are "languages for which you can make an LR(1) grammar."

    Then there's the deterministic context-free languages (DCFLs), which are languages for which you can build a deterministic PDA. It's known that the DCFLs are precisely the same as the LR(k) languages - that is, a language is a deterministic CFL if and only if there's an LR(1) grammar for it.

    So what does this mean for the hierarchy of languages? It looks something like this, from least powerful / most restrictive to most powerful / least restrictive:

    • Regular languages
      • Described by right-linear grammars, left-linear grammars, DFAs, NFAs, regular expressions, and prefix grammars
    • Deterministic CFLs
      • Described by deterministic PDAs and LR(k) grammars
    • (Nondeterministic) CFLs
      • Described by (non)deterministic PDAs and CFGs.
    • Context-sensitive languages
      • Described by linear-bounded automata and context-sensitive grammars
    • Recursive languages
      • Languages accepted by some decider, languages where both the language and the complement are recursively enumerable
    • Recursively enumerable languages
      • Languages of unrestricted grammars; languages of Turing machines; languages that can be verified by Turing machines; languages of enumerators; etc.