I'm trying to extract information from a c/c++ sourcefile. I'm trying to extract the content of a Macro.
E.g.:
MYMACRO(random content)
random content
should be extracted.MYMACRO (random content)
random content
should be extracted.Problem:
Bison won't recognice MYMACRO
as a token.
This code is only the first step and expects only the Macro itself as an input
Lex-File: parser.l
%{
#include <iostream>
#include "parser.tab.h"
using namespace std;
extern int yylex();
%}
%option noyywrap
%%
"MYMACRO" {
return EXTRACT_CONTENT_START;
}
[(] {
return BRACE_OPEN;
}
[)] {
return BRACE_CLOSE;
}
.* {
yylval.sval = strdup(yytext);
return ANY_TEXT;
}
%%
bison-file: parser.y:
%{
#include <iostream>
#include <string.h>
using namespace std;
extern int yylex();
extern int yyparse();
extern int yy_scan_string(char const *);
void yyerror(const char *s);
%}
%union {
int ival;
char * sval;
char cval;
}
%error-verbose
%token EXTRACT_CONTENT_START
%token <cval> BRACE_OPEN
%token <cval> BRACE_CLOSE
%token <sval> ANY_TEXT
%%
program:
EXTRACT_CONTENT_START
BRACE_OPEN
ANY_TEXT
BRACE_CLOSE
;
%%
int main(int ,char**){
yy_scan_string("MYMACRO(random content)");
yyparse();
}
void yyerror(const char *s) {
cout << endl << s << endl;
exit(-1);
}
random content
unexpected ANY_TEXT, expecting EXTRACT_CONTENT_START
( So @Flex: instead of sending the first appearing rule, the last rule is actual being used)I've also tried using states and change the last rule in the flex-file to
<STATE_CONTENT> .* {
yylval.sval = strdup(yytext);
return ANY_TEXT;
}
But this will result in an unrecognized rule
error on the line containint %%
.
The reason, why the last rule is taken in preference:
lex uses the longest match. And .* fits more characters, than anything else. Therefore ANY_TEXT is always the taken choice.
To solve it change it like this:
parser.l:
remove .*
-rule and add this one:
. {
yylval.cval = *yytext;
return ANY_CHAR;
}
This rule's longest match is only one character. It will therefore be on lowest priority comparing to the other rules.
parser.y:
Add a new token:
%token <cval> ANY_CHAR
For acting on the whole string, add:
anyText:
anyText ANY_CHAR { cout << $2; }
|
;
@State problem: Answer from rici:
You cannot put whitespace before the pattern, whether or not it is preceded by a state. Another way of saying that, which is technically more accurate, is that patterns cannot contain unquoted whitespace, and the prefix is part of the pattern