design-patterns abstract-syntax-tree interpreter state-machine

Why is the Interpreter design pattern not applicable when efficiency is a critical concern?

In the book Design Patterns, written by the Gang of Four, the following is stated in the 'Applicability' section of the Interpreter pattern:

efficiency is not a critical concern. The most efficient interpreters are usually not implemented by interpreting parse trees directly but by first translating them into another form. For example, regular expressions are often transformed into state machines.

Why is interpreting parse trees directly not efficient?
How do state machines help in improving this efficiency?
Are state machines more efficient then an Abstract Syntax Tree for 'interpretation' of a sentence? If so, why?

I viewed AST and state machine articles on the internet:

Unfortunately, I wasn't able to find the answer to the questions stated in the problem section. The reason for that could be that I don't fully comprehend the 'Applicability' section of the Interpreter pattern. As a result not formulating the right questions.

Solution

The most efficient interpreters are usually not implemented by interpreting parse trees directly but by first translating them into another form.

This is almost certainly true, but please note that it doesn't specify what the other form is, and nor does it imply that there is a single other form that applies to any parse tree or problem domain.

For example, regular expressions are often transformed into state machines.

This is an example of the above. Here the problem domain is simple pattern matching where the pattern was originally specified using a regular expression. In this particular case, a state machine is generally much more efficient than trying to emulate the regular expression as an NFA (since that's what the parse tree represents), because the state machine can be followed using a single table lookup at each character position with a single auxiliary variable (representing the state). That algorithm is guaranteed linear time and constant space. Emulating the NFA, on the other hand, either requires backtracking on failure, which could lead to exponential time, or it requires maintaining a set of possible states, which means that the auxiliary storage is the size of the regular expression and the time to handle a single input character is also proportional to the size of the regular expression. (That's an approximation; each state in the NFA corresponds to a position in the regular expression, so the state count and the length of the regular expression are linearly related but not identical. But it's close enough for O-notation.)

Does that mean that state machines are always a more efficient representation than a syntax tree? Certainly not. If the problem domain were computing the value of an arithmetic expression, then state machines are complete irrelevant. But the parse tree is still not the most efficient way of evaluating the expression. Evaluating the parse tree requires a recursive traversal over the expression, for every evaluation. That means extensive use of the processor stack and lots of procedure calls, none of which is actually necessary. Converting the parse tree into a simple vector of "three-address" instructions (effectively, a virtual machine) is a very simple operation and the resulting vector of instructions can be evaluated very rapidly. (If the language being evaluated has functions, you will need a stack, but if they are just algebraic expressions with only built-in functions, then you can probably compute the needed size of an array of temporaries. You can use indices into that array as addresses in the three-address code.)

Does that mean that a virtual machine program is always a more efficient representation than a syntax tree? Again, no, although it's probably a more generally applicable technique than the state machine. (And the state machine could be considered a sort of virtual machine program, if you squint your eyes and allow a little bit of hand-waving.)

So the bottom line is that if you want efficient evaluation, you need to tailor the data structure you use to represent the semantics of an input sentence to what you know about the problem domain and the likely control flow patterns which will arise.