Search code examples
compiler-constructionlexlexical-analysis

Using lex / creating scanner?


Recently, I have been studying compiler theory about lexical analysis. As I understood there some ways for constructing the lexical scanner, like :

  • Using Lex/Flex for generating the scanner automatically.
  • Build your own one. In examples, I have encountered the switch case model using the read-ahead technique (simulating the DFA, NFA).

My question is which one of them is more suitable for implementing a basic programming language (consists variables, conditions, loop)? How they should be used in practise? Is it possible to use them both?


Solution

  • lex is useful for simple languages, but is not used in those with complicated syntax (for example, flex/lex do not use a scanner written in lex to process lex input). Occasionally someone asks how to manage multiple syntax scanners (long ago, I helped someone on a program with multiple lex and yacc files — using sed, because this was before there were flex/bison with the options to help with renaming).

    For a practical demonstration, vi-like-emacs uses lex/flex for most of the syntax highlighting modules. Not all. Perl and ruby turn out to be too complex to bother with in lex/flex, due to the way they embed regular expressions with little clue. On the other hand, it has a workable (but large) scanner for lex/lex — written in lex. For the sake of example, I added a copy of that to vile's FAQ (see result).

    One can make a list of things like that, but as a rule, the reason for using lex or not is the complexity of the syntax. Also, managing error recovery is (according to some) a reason for the choice.