Search code examples
programming-languagesyacc

yacc/lex or hand-coding?


I am working on new programming language, but I was always puzzled by the fact that everyone is using yaxx/lex to parse the code, but I am not.

My compiler (which is already working) is handcoded in C++/STL, and I cannot say it's complex or took too much time. It has both some kind of lexer and parser, but they are not autogenerated.

Earlier, I wrote a C compiler(not full spec) the same way - it was able to compile the program in 1 pass, with all these back references resolving & preprocessing - this is definitely impossible with yacc/lex.

I just cannot convince myself to scrap all this, and start diving into yaxx/lex - which might need quite an effort to implement and might possibly introduce some grammar limitations.

Is there something I miss when not using yacc/lex? Do I do an evil thing?


Solution

  • Yacc is inflexible in some ways:

    • good error handling is hard (basically, its algorithm is only defined to parse a correct string correctly, otherwise, all bets are off; this is one of the reasons that GCC moved to a hand-written parser)
    • context-dependency is hard to express, whereas with a hand-written recursive descent parser you can simply add a parameter to the functions

    Furthermore, I have noticed that lex/yacc object code is often bigger than a hand-written recursive descent parser (source code tends to be the other way round).

    I have not used ANTLR so I cannot say if that is better at these points.