Is it possible to completely ignore certain characters in Lex? Any regex excluding characters will break apart the tokens where those characters occur rather than completely ignoring them. I am aware of the semicolon rule wherein the text is ignored; however, including a regex later on that accepts any characters still accepts characters previously declared to be ignored. Having that regex ignore those characters causes it to break the token when it meets them instead of skipping past them.
Is it possible to completely ignore certain characters in Lex?
No, original AT&T lex
utility does not have anything that would support this, nor does POSIX specify any such thing. Input is read from the specified stream, and matched directly against provided patterns. Every character obtained from the input is subject to matching -- only before lex
reads it in or after it tokenizes is there an opportunity to muck with character content.
It would be possible, but extremely messy to write a ruleset and corresponding actions that acted as if some specified character were completely ignored. Instead, your best bet is to ensure that the characters in question are stripped out before lex
ever sees them.
With traditional and POSIX lex
, data are read from a stream designated to the lexer via global stream pointer yyin
. Standard C provides no mechanism for wrapping or internally filtering streams, but you could insert an external filter by having your program fork, with the child reading the original input data, stripping out the unwanted characters, and piping the rest to the parent process. The parent, meanwhile, wraps the read end of the pipe in a stream (fdopen()
, for example), and assigns that to yyin
.
On the other hand, if you use Flex instead of traditional lex
then you have the alternative of redefining the YY_INPUT()
macro to filter out the unwanted characters before they reach the scanner proper. This is lighter-weight than forking, and it can be expressed in the flex
's input file, rather than requiring the the program using the scanner to set up the filter.
Either way, however, there is no built-in feature specifically for pretending that particular characters did not appear in the input at all.