I'm trying to modify a flex+bison generator to allow the inclusion of code snippets denoted by surrounding '{{' and '}}'. Unlike the multi-line comment case, I must capture all of the content.
My attempts either fail in the case where the '{{' and the '}}' are on the same line or they are painfully slow.
My first attempt was something like this:
%{
#include <stdio.h>
// sscce implementation of a growing string buffer
char codeBlock[4096];
int codeOffset;
const char* curFilename = "file.l";
extern int yylineno;
void add_code_line(const char* yytext)
{
codeOffset += sprintf(codeBlock + codeOffset, "#line %u \"%s\"\n\t%s\n", yylineno, curFilename, yytext);
}
%}
%option stack
%option yylineno
%x CODE_FRAG
%%
"{{"[ \n]* { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}" { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>[^\n]* { add_code_line(yytext); }
<CODE_FRAG>\n
\n
.
Note: the "codeBlock" implementation is a contrivance for the purpose of an SSCCE only. It's not what I'm actually using.
This works for a simple test case:
{{ from line 1
from line 2
}}
{{
from line 7
}}
Outputs
// code
#line 1 "file.l"
from line 1
#line 2 "file.l"
from line 2
// code
#line 7 "file.l"
from line 7
But it can't handle
{{ hello }}
The two solutions I can think of are:
/* capture character-by-character */
<CODE_FRAG>. { add_code_character(yytext[0]); }
And
<INITIAL>"{{".*?"}}" { int n = strlen(yytext); yytext + (n - 2) = 0; add_code(yytext + 2); }
The former seems likely to be slow, and the latter just feels wrong.
Any ideas?
--- EDIT ---
The following appears to achieve the result desired, but I'm not sure if it's a "good" Flex way to do this:
"{{"[ \n]* { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}" { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>.*?/"}}" { add_code_line(yytext); }
<CODE_FRAG>.*? { add_code_line(yytext); }
<CODE_FRAG>\n
Flex doesn't implement non-greedy matches. So .*?
won't work the way you expect it to in flex. (It will be an optional .*
, which is indistinguishable from .*
)
Here's a regular expression which will match from {{
as far as possible without a }}
:
"{{"([}]?[^}])*
That might not be what you want, since it won't allow nested {{...}}
within your code blocks. However, you didn't mention that as a requirement and none of your examples functions that way.
The above regular expression does not match the closing }}
, which appears to be what you want since it lets you call add_code(yytext+2)
without modifying the temporary buffer. However, you do need to deal with the }}
in your action. See below.
The regular expression above will match to the end of the file if there is no matching }}
. You probably want to deal with that as an error; the simplest way is to check if EOF is encountered while you are trying to ignore the }}
"{{"([}]?[^}])* { add_code(yytext+2);
if (input() == EOF || input() == EOF) {
/* Produce an error, unclosed {{ */
}
}