Let's say I'm parsing hexadecimal numbers in flex. I have something like this:
%x hexnumber
%%
"0x" { BEGIN hexnumber }
<hexnumber>[0-9A-F] { process_digit(); }
This works fine; the 0x
prefix starts hex-parsing mode, and then each digit is processed in turn.
The problem is that a hex constant doesn't have an explicit terminator token. So, how do I switch back to the INITIAL state? By the time I know that the next character isn't part of the numeric constant, it's been consumed.
I can always push it back onto the input stream with unput()
:
<hexnumber>. { unput(*yytext); BEGIN INITIAL; }
...but I'd very much prefer not do this (because of implementation details beyond the scope of this question using unput()
is very expensive for me).
I know that the generated state machine is capable of automatically switching back to the INITIAL state without consuming the next character, because otherwise rules like [0-9A-F]+
wouldn't work. Is there a way to achieve this using explicit start conditions?
Use yyless(0)
instead of unput(*yytext)
; yyless
is essentially free since it only adjusts a couple of pointers. It makesno attempt to reallocate or move the input buffer. (You also need BEGIN(INITIAL)
, of course.)
A much messier solution would be to use trailing context to distinguish between hex characters followed by other hex characters:
[[:xdigit:]]/[[:xdigit:]] process_digit();
[[:xdigit:]] process_digit(); BEGIN(INITIAL);
But that is a lot less flexible.