Search code examples
parsingpegrecursive-descent

Difference between a PEG and recursive descent parser?


I have recently come across PEG parsers, and Guido van Rossum's article on PEG parsers and how to construct them. That article talks about "PEG" parsers but internally it looks exactly like a recursive descent parser (generator). I have a feeling that that PEG parsers have something to do with generating recursive descent parsers, but am not sure.

What is the difference between a recursive descent parser and a PEG parser? and when should I use one or the other?


Solution

  • Short answer

    PEGs are grammars that describe recursive descent parsers.

    Longer answer

    When people talk about Parsing Expression Grammars (PEGs), they are often conflating three things:

    Bryan Ford (the creator of PEG) in his 2004 article describes the first two, but the first point is not a novel contribution. Rather, PEG is equivalent to top-down parsing language (TDPL) from the 1970s in terms of expressive power, but Ford borrows convenient aspects of EBNF and regular expression syntax to make grammars easier to read and write than the extremely minimal TDPL. Basically, PEG's notation makes TDPL more approachable, like writing code in C or Python instead of Assembly.

    In Ford's 2002 article based on his master's thesis, he also introduces the Packrat parsing algorithm which allows recursive descent parsers, even those with infinite lookahead like PEG, to run in linear time by memoizing, or caching, intermediate results. This is a theoretical result, however, and even if it helps with some pathological cases, in many cases the overhead of Packrat's memoization is significant. Parsing with PEGs without Packrat parsing is simply recursive descent parsing.

    One interesting thing about PEG's formal properties, compared to CFGs, is the prioritized choice operator (PEG notation uses / instead of EBNF's | for ambiguous choice). With prioritized choice, alternatives are attempted in order, and once an alternative succeeds, the others will not be attempted. Thus, a PEG, unlike a context-free grammar (CFG), is unambiguous; there is either one parse for an input, or no parses. Relatedly, PEGs are considered "analytical" grammars rather than "generative" grammars (e.g., CFG, which has its roots in linguistics for describing natural language utterances) as their purpose is for parsing rather than licensing (or generating) valid strings.

    Conclusion

    You don't really choose between PEG parsing and recursive descent parsing as they are about the same thing, but you may choose to use a PEG parsing library to implement your parser via a grammar instead of hand-writing the parsing functions. As Michael Dyck commented, however, PEGs are a subset of recursive descent parsers as you can write recursive descent parsers that go beyond what's representable in a PEG. Then again, many PEG libraries extend the original formalism with features such as semantic actions or additional syntactic constructs.