I have lex grammar that contains rules for double quotes string:
...
%x DOUBLEQUOTE
...
%%
"\"" { yylval->string = NULL; BEGIN(DOUBLEQUOTE); }
<DOUBLEQUOTE> {
"\n" {
/* reset column counter on new line */
PARSER->linepos = 0;
(PARSER->linenum)++;
expr_parser_append_string(PARSER, &(yylval->string), yytext);
}
[^\"\n]+ { expr_parser_append_string(PARSER, &(yylval->string), yytext); }
"\\\"" { expr_parser_append_string(PARSER, &(yylval->string), yytext); }
"\"" {
BEGIN(INITIAL);
if ( yylval->string != NULL )
string_unescape_c(yylval->string);
return ( TOKEN_STRING );
}
}
Somehow the escape sequence \" is matched only at beginning of a string. If the \" appears latter in a string it looks like the characters \ and " are matched separately.
For instance:
Passes: "\" "
Fails: " \" "
Fails: "This is string example: \"a string inside of string\""
Why the escape sequence \" is not matched by the rule "\\\""
when appears latter in a string?
If the backslash is not the first character in the quoted string, then the backslash will be matched at the end of some token. For example:
"abc\"def"
^^^^ First token, longest match of [^"\n]+
^ Terminates quoted string
So you need to exclude backslashes as well. But once you do that, you need to provide a pattern which does match backslash escapes, not just backslash-escaped quotes. So I'd suggest:
<DOUBLEQUOTE>{
\\?\n { /* Handle newline */ }
([^"\\\n]|\\.)+ { expr_parser_append_string(PARSER,
&yylval->string,
yytext); }
\" { BEGIN(INITIAL); ... }
}
Note: I added an optional backslash to the beginning of the first pattern, in order to handle the case where the backslash immediately precedes a newline character. The .
in the second pattern (\\.
) will not match a newline so otherwise backslash-newline wouldn't be recognized at all.