Search code examples
parsingembeddedbisonflex-lexeravr-gcc

Writing a parser like Flex/Bison that is usable on 8-bit embedded systems


I'm writing a small interpreter for a simple BASIC like language as an exercise on an AVR microcontroller in C using the avr-gcc toolchain.

If I were writing this to run on my Linux box, I could use flex/bison. Now that I restricted myself to an 8-bit platform, how would I code the parser?


Solution

  • I've implemented a parser for a simple command language targeted for the ATmega328p. This chip has 32k ROM and only 2k RAM. The RAM is definitely the more important limitation -- if you aren't tied to a particular chip yet, pick one with as much RAM as possible. This will make your life much easier.

    At first I considered using flex/bison. I decided against this option for two major reasons:

    • By default, Flex & Bison depend on some standard library functions (especially for I/O) that aren't available or don't work the same in avr-libc. I'm pretty sure there are supported workarounds, but this is some extra effort that you will need to take into account.
    • AVR has a Harvard Architecture. C isn't designed to account for this, so even constant variables are loaded into RAM by default. You have to use special macros/functions to store and access data in flash and EEPROM. Flex & Bison create some relatively large lookup tables, and these will eat up your RAM pretty quickly. Unless I'm mistaken (which is quite possible) you will have to edit the output source in order to take advantage of the special Flash & EEPROM interfaces.

    After rejecting Flex & Bison, I went looking for other generator tools. Here are a few that I considered:

    You might also want to take a look at Wikipedia's comparison.

    Ultimately, I ended up hand coding both the lexer and parser.

    For parsing I used a recursive descent parser. I think Ira Baxter has already done an adequate job of covering this topic, and there are plenty of tutorials online.

    For my lexer, I wrote up regular expressions for all of my terminals, diagrammed the equivalent state machine, and implemented it as one giant function using goto's for jumping between states. This was tedious, but the results worked great. As an aside, goto is a great tool for implementing state machines -- all of your states can have clear labels right next to the relevant code, there is no function call or state variable overhead, and it's about as fast as you can get. C really doesn't have a better construct for building static state machines.

    Something to think about: lexers are really just a specialization of parsers. The biggest difference is that regular grammars are usually sufficient for lexical analysis, whereas most programming languages have (mostly) context-free grammars. So there's really nothing stopping you from implementing a lexer as a recursive descent parser or using a parser generator to write a lexer. It's just not usually as convenient as using a more specialized tool.