I created a Bison parser to convert input data to XML. Whenever I run my parser the output contains the XML and following the XML is this symbol:
>
Yikes! That greater-than symbol makes the XML not well-formed.
What is producing that greater-than symbol? How do I tell Bison to not generate it? I don't think there is anything in my "main" function that tells Bison to output a greater-than symbol:
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
yyparse();
fclose(yyin);
return 0;
}
At a guess, you are using (f)lex and none of your lexer patterns match the >
, so the default action is used. The (f)lex default action is to print unmatched characters on yyout
, which is often undesirable.
I recommend always using %option nodefault
in your flex files. That will cause flex to issue a warning if anything could trigger the default rule (and it changes the default rule to a fatal error). Unfortunately it doesn't actually tell you which characters can't be matched, but you should be able to figure that out by looking at your rules.
A good debugging technique is to compile a version of your scanner with debugging traces, and then write a simple wrapper which just calls yylex
in a loop until it returns 0. (Printing out the tokens returned is more work but can be useful, too.) Run that with test inputs until you are satisfied that the scanner is working as per expectation.
Bison also has a trace facility, described in the bison manual in the "debugging your parser" chapter. It can also help you debug scanner issues, but you have to wade through a lot more debug logging.