In my scanner.lex file I have this:
{Some rule that matches strings} return STRING; //STRING is enum
in my c++ file I have this:
if (yylex == STRING) {
cout << "STRING: " << yytext << endl;
Obviously with some logic to take the input from stdin.
Now if this program gets the input "Hello\nWorld", my output is "STRING: Hello\nWorld"
, while I would want my output to be:
Hello
World
The same goes for other escape characters such as \"
,\0
, \x<hex_number>
, \t
, \\
... But I'm not sure how to achieve this. I'm not even sure if that's a flex issue or if I can solve this using only c++ tools...
How can I get this done?
As @Some programmer dude mentions in a comment, there is an an example of how to do this using start conditions in the Flex documentation. That example puts the escaping rules into a separate start condition; each rule is implemented by appending the unescaped text to a buffer. And that's the way it's normally done.
Of course, you might find an external library which unescapes C-style escaped strings, which you could call on the string returned by flex. But that would be both slower and less flexible than the approach suggested in the Flex manual: slower because it requires a second scan of the string, and less flexible because the library is likely to have its own idea of what escapes to handle.
If you're using C++, you might find it more elegant to modify that example to use a std::string
buffer instead of an arbitrary fixed-size character array. You can compile a flex-generated scanner with C++, so there is no problem using C++ standard library objects in your scanner code.
Depending on the various semantic value types you are managing, you will probably want to modify the yylex
prototype to either use an additional reference parameter or a more structured return type, in order to return the token value to the caller. Note that while it is OK to use yytext
before the next call to yylex
, it's not generally considered good style since it won't work with most parsers: in general, parsers require the ability to look one or more tokens ahead, and thus yytext
is likely to be overwritten by the time your parser needs its value. The flex manual documents the macro hook used to modify the yylex()
prototype.