Search code examples
compiler-constructionlexical-analysisdfa

Compiler Construction - Why does some tokens requires a final state with a backtrack?


So i'm studying compiler construction for an exam , but there seems something i can't understand.

Let's assume the following set of terminals:

T = {:, *, =, (, ), <, >, {, }, [a..z], [0..9]}

And the set of accepted tokens (l points to a letter , and d points to a digit):

A = { l(l|d)* , (d)+ , { (anything)* } , : , := , < , <= , <> , > , >= , ( , (* , * , *) } 

Here's the state transition diagram:

State transition diargram

Now i understand that state number 20 needs backtracking because:

  • A ( might come , but it might be followed by a * .
  • A : might come , but it might be followed by a = .
  • A < might come , but it might be followed by a = or > .
  • A > might come , but it might be followed by a = .

But what about states 3 , 5 , and 11 ? , why do we need to backtrack them ?.


Solution

  • It looks like "backtracking" here means you've read a character, but you haven't consumed it. So for state 3 we've consumed a l and maybe some l and d, and then we get some character that is neither; that character terminates the l(l|d)* rule, but then still needs to be processed.

    Contrast that with state 7 where we read a } character to terminate the rule, and we've used up the } so we don't need to "backtrack."

    This explanation is consistent with states 3, 5, and 20, but I don't get why 11 needs to backtrack.