Search code examples
compiler-constructionautomata

Does a language compiler use a complex DFA to accept programs?


I am reading up on Theory of Computation. And I have no practical experience of programming compiler.

So it occurred to me, does a C or Java compiler use a huge DFA to Validate a program (String in TOC parlance)?

Are compilers practical implementations of DFA?


Solution

  • Some compilers do, others don't. Those that use DFAs typically use scanner-generators like lex/flex to build the DFA.

    Of course, a DFA will only take you so far (up to a regular language, in fact). No practical programming languages can be described by a regular expression, since regular expressions cannot handle recursive structures like parenthesized expressions or nested control-flow blocks. So the DFA, if any, will only be used to break the input into a sequence of tokens. The tokens will then be parsed by some kind of pushdown automaton, or by a recursive descent parser, or by pure black magic on the part of a coder. Again, the PDA (if any) may well be generated automatically, using a tool like bison, ANTLR, and many others.

    It's rare to find a language pure enough that a simple two-phase DFA scan / PDA parse will actually correctly create a parse tree. There seems to always be a temptation to add a syntactic construct which can only be parsed using a Turing-complete formalism. So in practical compilers, there will be places where the potentially elegant theoretical model has small holes drilled into it with spaghetti threaded through them.

    Despite all that, the theoretical study of parsing techniques has simplified compiler construction considerably over the years, as well as being a very beautiful and intriguing corner of mathematics.