compiler-construction lex lexical-analysis

Using lex / creating scanner?

Recently, I have been studying compiler theory about lexical analysis. As I understood there some ways for constructing the lexical scanner, like :

Using Lex/Flex for generating the scanner automatically.
Build your own one. In examples, I have encountered the switch case model using the read-ahead technique (simulating the DFA, NFA).

My question is which one of them is more suitable for implementing a basic programming language (consists variables, conditions, loop)? How they should be used in practise? Is it possible to use them both?

Solution

lex is useful for simple languages, but is not used in those with complicated syntax (for example, flex/lex do not use a scanner written in lex to process lex input). Occasionally someone asks how to manage multiple syntax scanners (long ago, I helped someone on a program with multiple lex and yacc files — using sed, because this was before there were flex/bison with the options to help with renaming).

For a practical demonstration, vi-like-emacs uses lex/flex for most of the syntax highlighting modules. Not all. Perl and ruby turn out to be too complex to bother with in lex/flex, due to the way they embed regular expressions with little clue. On the other hand, it has a workable (but large) scanner for lex/lex — written in lex. For the sake of example, I added a copy of that to vile's FAQ (see result).

One can make a list of things like that, but as a rule, the reason for using lex or not is the complexity of the syntax. Also, managing error recovery is (according to some) a reason for the choice.