Search code examples
c++boost-spirit

Correctly set the span of an expectation_failure in boost::spirit


I am trying to add error reporting to my parser, and I don't know how to do it correctly for a specific rule.

The rule in question matches function calls of the form mag(12.5) or sqrt(5.38). There is a fixed list of function names, and each one can parse its parameters differently than the other ones (time(4) only accepts int values, for example). My grammar produces an AST where each function has its own node type (Mag, Sqrt and Time).

My first implementation was simple: there was one rule for each function I support.

fn %= mag | sqrt | time;
mag %= (lit("mag") >> lit('(') > double_ > lit(')'));
sqrt %= (lit("sqrt") >> lit('(') > double_ > lit(')'));
time %= (lit("time") >> lit('(') > int_ > lit(')'));

This works, but if the input contains a function name that is not supported (hello(12)), the rule fails without an error. What I want is the rule to fail with an expectation_failure (or similar), that would say "Expected mag, sqrt or time, got 'hello'".

Below is my attempt to generate an error. It reads any ident followed by an opening parenthesis (using an expectation operator), and then uses a predicate in eps to do two things: generate the correct node depending on the function name, and failing if the name is unknown, thus generating the expectation_failure. The problem is that the location of the expectation_failure is not what I want. It produces:

Expected <function parameters>
Got 12)

Instead of

Expected <mag, sqrt or time>
Got hello

Is there a way to control the values of expectation_failure::first and ::last? Or is there another way to report an error than an expectation_failure that I should use? Also, I don't understand why my expectation_failure points to "12)" and not just "12" in this case.

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_function.hpp>
#include <iostream>
#include <string>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace spirit = boost::spirit;

struct Mag { double val; };
struct Sqrt { double val; };
struct Time { int val; };
using Fn = boost::variant<Mag, Sqrt, Time>;

std::ostream& operator<<(std::ostream& os, const Mag& v) {
  os << "mag(" << v.val << ")";
  return os;
}

std::ostream& operator<<(std::ostream& os, const Sqrt& v) {
  os << "sqrt(" << v.val << ")";
  return os;
}

std::ostream& operator<<(std::ostream& os, const Time& v) {
  os << "time(" << v.val << ")";
  return os;
}


BOOST_FUSION_ADAPT_STRUCT(Mag, (double, val))
BOOST_FUSION_ADAPT_STRUCT(Sqrt, (double, val))
BOOST_FUSION_ADAPT_STRUCT(Time, (int, val))


void makeMag_(Fn& fn, double val) {
  Mag s;
  s.val = val;
  fn.swap(Fn(s));
}

void makeSqrt_(Fn& fn, double val) {
  Sqrt s;
  s.val = val;
  fn.swap(Fn(s));
}

void makeTime_(Fn& fn, int val) {
  Time s;
  s.val = val;
  fn.swap(Fn(s));
}

BOOST_PHOENIX_ADAPT_FUNCTION(void, makeMag, makeMag_, 2)
BOOST_PHOENIX_ADAPT_FUNCTION(void, makeSqrt, makeSqrt_, 2)
BOOST_PHOENIX_ADAPT_FUNCTION(void, makeTime, makeTime_, 2)

template <typename Iterator>
struct FnParser : qi::grammar<Iterator, qi::locals<std::string>, ascii::space_type, Fn()>
{
  FnParser() : FnParser::base_type(fn)
  {
    using qi::double_;
    using qi::int_;
    using qi::_val;
    using qi::_1;
    using qi::_a;
    using qi::_r1;
    using qi::eps;
    using qi::lit;
    using qi::lexeme;
    using qi::alpha;

    ident %= lexeme[+alpha];

    fnParams =
          (eps(_r1 == "mag")  >> double_)  [makeMag(_val, _1)]
        | (eps(_r1 == "sqrt") >> double_)  [makeSqrt(_val, _1)]
        | (eps(_r1 == "time") >> int_)     [makeTime(_val, _1)]
        ;

    fn =  ident        [_a = _1]
        > lit('(')
        > fnParams(_a) [_val = _1]
        > lit(')');

    ident.name("identifier");
    fnParams.name("function parameters");
    fn.name("function");
  }

  qi::rule<Iterator, qi::locals<std::string>, ascii::space_type, Fn()> fn;
  qi::rule<Iterator, ascii::space_type, Fn(std::string)> fnParams;
  qi::rule<Iterator, ascii::space_type, std::string()> ident;
};

int main() {

      using Iter = std::string::const_iterator;
      using boost::spirit::ascii::space;

      FnParser <Iter> parser;

      std::string str;

      while (std::getline(std::cin, str)) {

        if (str.empty() || str[0] == 'q' || str[0] == 'Q')
          break;

        Iter iter = str.begin();
        Iter end = str.end();
        Fn fn;

        try {
          bool r = phrase_parse(iter, end, parser, space, fn);

          if (r && iter == end) {
            std::cout << "Ok\n";
          } else {
            std::string rest(iter, end);
            std::cout << "Failed\n"
                      << "Stopped at \"" << rest << "\"\n";
          }
        } catch(qi::expectation_failure<Iter> e) {
          std::string got(e.first, e.last);
          std::cout << "Expected " << e.what_ << "\n"
                    << "Got " << std::string(e.first, e.last) << "\n";
        }
      }
    }

Edit

I did not give the full grammar, and so some context may be missing. Apart from function calls, the full grammar has arithmetic operators and variables. The only way to tell apart a function call from a variable is the presence of an opening parenthesis afterwards. Both can be used in the same context and I use an ordered alternative fn | var to give priority to the function call. This is why I put the expectation point after the parenthesis, and not before.


Solution

  • You already control the location of the expectation failure.

    In e.g.

    mag %= (lit("mag") >> lit('(') > double_ > lit(')'));
    

    the expectation point is > double_. To move it to the start of the argument list, say:

    mag %= lit("mag") > (lit('(') >> double_ > lit(')'));
    

    By the way, you can write this as:

    mag = "mag" > ('(' >> double_ >> ')'));
    

    Also, I don't understand why my expectation_failure points to "12)" and not just "12" in this case.

    I think it merely prints to the end of the input sequence. It might print to the last portion of input seen in the case of input iterators (qi::istream_iterator), but that's guessing.

    As a side note, you might get more control using on_error which is documented here: https://www.boost.org/doc/libs/1_67_0/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___error_handling.html and in the compiler examples.


    Update

    To the EDIT

    The only way to tell apart a function call from a variable is the presence of an opening parenthesis afterwards. Both can be used in the same context and I use an ordered alternative fn | var to give priority to the function call. This is why I put the expectation point after the parenthesis, and not before.

    You can still have it:

    mag = "mag" >> &lit('(') > ('(' >> double_ >> ')'));
    

    This uses the lookahead &lit('(') to enter a branch, then starts with the expectation point. So, no '(' is just a non-match, but expectation point "fires" at the argument list none-the-less.

    Use !lit('(') for negative lookahead assertion. (docs here and here).

    Other Ideas

    You state:

    The only way to tell apart a function call from a variable is the presence of an opening parenthesis afterwards

    This of course depends on your choice about symbol-tables and semantic analysis. See these examples where I did do the function-detection dynamically:

    Somewhat more remotely related: