Regular expression can be used to express all kinds of lexical parser requirements?

I'm learning Compilers Principles recently. I notice all examples from text books describes a language lexcial parser using "lex" or "flex" with regular expressions to show how to analyze input source files.

Does it indicate that, all known programming languages, can be implemented using type 3 grammar to do lexical parsing? Or it's just that text books are using simple samples to show ideas?

Solution

Most lexemes in most languages can be identified with regular expressions, but there are exceptions. (When it comes to parsing computer languages, there are always exceptions. Without exception.)

For example, you cannot match a C++ raw string literal with a regex. You cannot tell without syntactic analysis whether /= in a Javacript program is the single lexeme used to indicate divide-and-assign, or whether it is the start of a regular expression which matches a atring starting with =. Languages which allow nested comments (unlike C) require something a bit more powerful.

But it's enormously easier to write a few regexes than to write a full state machine in raw C, so there is a lot of motivation to find ways of bending flex to your will for a few exceptional cases. And flex cooperates to a certain extent by providing features which allow you to escape from the regex straightjacket when necessary. In an advanced class on lexical analysis you might learn more about these features.