I am relatively new to Spirit Qi, and am trying to parse an assembler-like language.
For example, I'd like to parse:
Func Ident{
Mov name, "hello"
Push 5
Exit
}
So far, so good. I can parse it properly. However, the error handler sometimes comes up with strange error locations. Take for example the following faulty code:
Func Ident{
Mov name "hello" ; <-- comma is missing here
Push 5
Exit
}
Here are the rules involved in this parsing:
gr_function = lexeme["Func" >> !(alnum | '_')] // Ensure whole words
> gr_identifier
> "{"
> *( gr_instruction
|gr_label
|gr_vardecl
|gr_paramdecl)
> "}";
gr_instruction = gr_instruction_names
> gr_operands;
gr_operands = -(gr_operand % ',');
The parse will notice the error, but complain about a missing "}" after the Mov. I have a feeling that the issue is in the definition for "Func", but cannot pinpoint it. I'd like the parser to complain about a missing "," It would be ok if it complained about consequential errors, but it should definitely pinpoint a missing comma as the culprit.
I have tried variations such as:
gr_operands = -(gr_operand
>> *(','
> gr_operand)
);
And others, but with other strange errors.
Does anyone have an idea of how to say "Ok, you may have an instruction without operands, but if you find one, and there is no comma before the next, fail at the comma"?
UPDATE
Thank you for your answers so far. The gr_operand is defined as follows:
gr_operand = ( gr_operand_intlit
|gr_operand_flplit
|gr_operand_strlit
|gr_operand_register
|gr_operand_identifier);
gr_operand_intlit = int_;
gr_operand_flplit = double_;
gr_operand_strlit = '"'
> strlitcont
> '"'
;
gr_operand_register = gr_register_names;
// TODO: Must also not accept the keywords from the statement grammar
gr_operand_identifier = !(gr_instruction_names | gr_register_names)
>> raw[
lexeme[(alpha | '_') >> *(alnum | '_')]
];
escchar.name("\\\"");
escchar = '\\' >> char_("\"");
strlitcont.name("String literal content");
strlitcont = *( escchar | ~char_('"') );
You'll want to make it explicit what can be an operand. I guessed this:
gr_operand = gr_identifier | gr_string;
gr_string = lexeme [ '"' >> *("\"\"" | ~char_("\"")) >> '"' ];
Unrelated, but you might want to make it clear that a newline starts a new statement (using blank_type as the skipper):
>> "{"
>> -(
gr_instruction
| gr_label
| gr_vardecl
| gr_paramdecl
) % eol
> "}";
Now, the parser will be able to complain that it expects a newline at the time of parse fail.
I made up a fully working sample using your sketches in the original post.
See it live on Coliru:
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper = qi::blank_type>
struct parser : qi::grammar<It, Skipper>
{
parser() : parser::base_type(start)
{
using namespace qi;
start = lexeme["Func" >> !(alnum | '_')] > function;
function = gr_identifier
>> "{"
>> -(
gr_instruction
//| gr_label
//| gr_vardecl
//| gr_paramdecl
) % eol
> "}";
gr_instruction_names.add("Mov", unused);
gr_instruction_names.add("Push", unused);
gr_instruction_names.add("Exit", unused);
gr_instruction = lexeme [ gr_instruction_names >> !(alnum|"_") ] > gr_operands;
gr_operands = -(gr_operand % ',');
gr_identifier = lexeme [ alpha >> *(alnum | '_') ];
gr_operand = gr_identifier | gr_string;
gr_string = lexeme [ '"' >> *("\"\"" | ~char_("\"")) >> '"' ];
BOOST_SPIRIT_DEBUG_NODES((start)(function)(gr_instruction)(gr_operands)(gr_identifier)(gr_operand)(gr_string));
}
private:
qi::symbols<char, qi::unused_type> gr_instruction_names;
qi::rule<It, Skipper> start, function, gr_instruction, gr_operands, gr_identifier, gr_operand, gr_string;
};
int main()
{
typedef boost::spirit::istream_iterator It;
std::cin.unsetf(std::ios::skipws);
It f(std::cin), l;
parser<It, qi::blank_type> p;
try
{
bool ok = qi::phrase_parse(f,l,p,qi::blank);
if (ok) std::cout << "parse success\n";
else std::cerr << "parse failed: '" << std::string(f,l) << "'\n";
if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
return ok;
} catch(const qi::expectation_failure<It>& e)
{
std::string frag(e.first, e.last);
std::cerr << e.what() << "'" << frag << "'\n";
}
return false;
}