Search code examples
parsingyacclex

PLY for C source language


Can I use python lex yacc for lexical analyzer and parser for C source language? Actually it has been written that yacc uses uses LALR-parsing, PLY uses LR-parsing, which is reasonably efficient and well suited for larger grammars, but slightly restricts the types of grammars that can be successfully parsed. Now I am currently doing compiler course and still to do parsing part. So don't know much about LALR parsing.


Solution

  • The GCC compiler was implemented using Bison, an LALR parser generator, for many years. LR is stronger than LALR, so you can technically do it.

    Now, whether you want to do it is another question. LALR doesn't help with certain nastly (wow, a trumpism like "bigly") features of C, and various lexer hacks were used to make it work. (See my SO answer on Why C/C++ can't be parsed with pure LR parsers: https://stackoverflow.com/a/1004737/120163). Still, it was useful for a long time.

    Now, it sounds like you are doing a compiler class. In this case, you probably are not implementing "all of C", but rather an interesting subset/variant. In that case, you should be able to design your "C-like" grammar by bending it away from C's trouble spots and get on with your class. There is little point for your class in learning how to hack LALR/LR parsers to handle strange syntax problems. What you need to learn in the class is what parsers do, and how they fit into the overall structure of a compiler; adding weirdness won't improve learning the basics. If you finish the class, and go work on building parsers for real languages, you'll encounter these problems soon enough and can deal with them then.

    LALR parsing is just fine if you get to decide the langauge syntax.