Search code examples
c++c++17grammarboost-spiritboost-spirit-x3

How can I resolve mismatched alternative parsers and their attributes


I'm trying to parse into something of the form

enum class shape { ellipse, circle };
enum class other_shape { square, rectangle };
enum class position { top, left, right, bottom, center, bottom };
struct result
{
    std::variant<shape, std::string> bla;
    position pos;
    std::vector<double> bloe;
};

I know this doesn't make much sense (why not merge shape and other_shape, right?), but I tried to simplify the result types into something that resembles the buildup of the real result. But the form of the input is somewhat flexible, such that I seem to need extra alternatives that do not properly map onto the above struct definition, resulting in "unexpected attribute size" static assertions.

The real problem is the comma between the bla+pos and bloe parts in the input, due to both being possibly omitted. Example inputs

circle at center, 1, 2, 3
at top, 1, 2, 3
at bottom
circle, 1, 2 3
1, 2, 3
my_fancy_shape at right, 1

Each time some part is omitted, it gets a default value (let's say the first value of the enum and type in the variant.

My grammar looks somewhat like this

( circle
| ellipse
| square
| rectangle
| x3::attr(shape::circle)
) >> ( "at" >> position
     | x3::attr(css::center)
     ) >> -x3::lit(',')
  >> x3::double_ % ','

As you can see, the first alternative set maps directly to the variant (and includes a default value if it's completely omitted), the second alternative set provides a default value if the at portion is missing. Next is the vector of comma-separated values.

The issue I have here is that the above grammar will match both these invalid inputs:

, 1, 2, 3
circle 1, 2, 3

So the result, although somewhat elegant, is sloppy.

How can I, without altering the form of the result, write a grammar that has the required comma only if the first part is not empty?

I can think of grammars that do this by joining the two alternative sets into one set of all mixed possibilities, with the comma where it actually should appear, but then Spirit.X3 cannot map this alternative parser onto two members (a variant and a value). E.g. a very inefficient baseline "all the posibilities listed":

( circle >> x3::attr(position::center) >> ','
| ellipse >> x3::attr(position::center) >> ','
| square >> x3::attr(position::center) >> ','
| rectangle >> x3::attr(position::center) >> ','
| circle >> "at" >> position >> ','
| ellipse >> "at" >> position >> ','
| square >> "at" >> position >> ','
| rectangle >> "at" >> position >> ','
| x3::attr(shape::circle) >> "at" >> position >> ','
| x3::attr(shape::circle) >> x3::attr(position::center)
) >> x3::double_ % ','

Where the last option omits the comma, but aside from being quite excessive, X3 refuses to map this onto the result struct.


Solution

  • I'd model the grammar simpler, top-down and to match the AST.

    Simplifying the AST types:

    namespace AST {
        enum class shape       { ellipse, circle                  } ;
        enum class other_shape { square, rectangle                } ;
        enum class position    { top, left, right, bottom, center } ;
    
        using any_shape = std::variant<shape, other_shape, std::string>;
        using data = std::vector<double>;
    
        struct result {
            any_shape bla;
            position  pos;
            data      bloe;
        };
    }
    
    BOOST_FUSION_ADAPT_STRUCT(AST::result, bla, pos, bloe)
    

    I'd write the parser as:

    auto const data = as<AST::data>(double_ % ',');
    auto const position = kw("at") >> position_sym;
    
    auto const custom_shape =
            !(position|data) >> kw(as<std::string>(+identchar));
    auto const any_shape = as<AST::any_shape>(
            ikw(shape_sym) | ikw(other_shape_sym) | custom_shape);
    
    auto const shape_line = as<AST::result>(
            -any_shape >> -position >> (','|&EOL) >> -data);
    auto const shapes     = skip(blank) [ shape_line % eol ];
    

    This is using a few helper shorthand functions as you know I often do:

    ////////////////
    // helpers - attribute coercion
    template <typename T>
    auto as  = [](auto p) {
        return rule<struct _, T> {typeid(T).name()} = p;
    };
    
    // keyword boundary detection
    auto identchar = alnum | char_("-_.");
    auto kw  = [](auto p) { return lexeme[p >> !identchar]; };
    auto ikw = [](auto p) { return no_case[kw(p)]; };
    
    auto const EOL = eol|eoi;
    

    This lands you in a a better spot already than your current reported situation:

    Live On Coliru

     ==== "circle at center, 1, 2, 3"
    Parsed 1 shapes
    shape:circle at center, 1, 2, 3
     ==== "at top, 1, 2, 3"
    Parsed 1 shapes
    shape:ellipse at top, 1, 2, 3
     ==== "at bottom"
    Parsed 1 shapes
    shape:ellipse at bottom
     ==== "1, 2, 3"
    Parse failed
    Remaining unparsed input: "1, 2, 3"
     ==== "my_fancy_shape at right, 1"
    Parsed 1 shapes
    custom:"my_fancy_shape" at right, 1
     ==== "circle at center, 1, 2, 3
                   at top, 1, 2, 3
                   at bottom
                   circle, 1, 2, 3
                   1, 2, 3
                   my_fancy_shape at right, 1"
    Parsed 4 shapes
    shape:circle at center, 1, 2, 3
    shape:ellipse at top, 1, 2, 3
    shape:ellipse at bottom
    shape:circle at top, 1, 2, 3
    Remaining unparsed input: "
                   1, 2, 3
                   my_fancy_shape at right, 1"
     ==== "circle, 1, 2 3"
    Parsed 1 shapes
    shape:circle at top, 1, 2
    Remaining unparsed input: " 3"
     ==== ", 1, 2, 3"
    Parsed 1 shapes
    shape:ellipse at top, 1, 2, 3
     ==== "circle 1, 2, 3"
    Parse failed
    Remaining unparsed input: "circle 1, 2, 3"
    

    As you see the last three fail to parse the full input, as they're supposed to. However, there's one that you'd like to succeed, which doesn't:

     ==== "1, 2, 3"
    Parse failed
    Remaining unparsed input: "1, 2, 3"
    

    HACKING

    This is tricky to get out of without writing an explosion of parsers. Notice that the trick to get ',' parsing correctly between shape position and data was ','|&EOL.

    What we'd actually need to be able to write is &BOL|','|&EOL. But there is no such thing as BOL. Let's emulate it!

    // hack for BOL state
    struct state_t {
        bool at_bol = true;
    
        struct set_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "set_bol (from " << s.at_bol << ")" << std::endl;
                s.at_bol = true;
            }
        };
    
        struct reset_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "reset_bol (from " << s.at_bol << ")" << std::endl;
                s.at_bol = false;
            }
        };
    
        struct is_at_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "is_at_bol (" << s.at_bol << ")" << std::endl;
                _pass(ctx) = s.at_bol;
            }
        };
    };
    auto const SET_BOL   = eps[ state_t::set_bol{} ];
    auto const RESET_BOL = eps[ state_t::reset_bol{} ];
    auto const AT_BOL    = eps[ state_t::is_at_bol{} ];
    

    Now we can mix in the appropriate epsilons here and there:

    template <typename T>
    auto opt = [](auto p, T defval = {}) {
        return as<T>(p >> RESET_BOL | attr(defval));
    };
    
    auto const shape_line = as<AST::result>(
            with<state_t>(state_t{}) [
                SET_BOL >>
                opt<AST::any_shape>(any_shape) >>
                opt<AST::position>(position) >>
                (AT_BOL|','|&EOL) >> -data
            ]);
    

    It's ugly, but it works:

     ==== "circle at center, 1, 2, 3"
    Parsed 1 shapes
    shape:circle at center, 1, 2, 3
     ==== "at top, 1, 2, 3"
    Parsed 1 shapes
    shape:ellipse at top, 1, 2, 3
     ==== "at bottom"
    Parsed 1 shapes
    shape:ellipse at bottom
     ==== "1, 2, 3"
    Parsed 1 shapes
    shape:ellipse at top, 1, 2, 3
     ==== "my_fancy_shape at right, 1"
    Parsed 1 shapes
    custom:"my_fancy_shape" at right, 1
     ==== "circle at center, 1, 2, 3
                   at top, 1, 2, 3
                   at bottom
                   circle, 1, 2, 3
                   1, 2, 3
                   my_fancy_shape at right, 1"
    Parsed 6 shapes
    shape:circle at center, 1, 2, 3
    shape:ellipse at top, 1, 2, 3
    shape:ellipse at bottom
    shape:circle at top, 1, 2, 3
    shape:ellipse at top, 1, 2, 3
    custom:"my_fancy_shape" at right, 1
     ==== "circle, 1, 2 3"
    Parsed 1 shapes
    shape:circle at top, 1, 2
    Remaining unparsed input: " 3"
     ==== ", 1, 2, 3"
    Parsed 1 shapes
    shape:ellipse at top
    Remaining unparsed input: ", 1, 2, 3"
     ==== "circle 1, 2, 3"
    Parse failed
    Remaining unparsed input: "circle 1, 2, 3"
    

    Oh, you might add eoi to the shapes parser rule so we get slightly less confusing output when partial input is parsed, but that's up to you to decide

    Full Demo

    Live On Wandbox¹

    //#define BOOST_SPIRIT_X3_DEBUG
    #include <boost/config/warning_disable.hpp>
    #include <boost/spirit/home/x3.hpp>
    #include <boost/fusion/adapted.hpp>
    
    #include <iostream>
    #include <iomanip>
    #include <variant>
    
    namespace AST {
        enum class shape       { ellipse, circle                  } ;
        enum class other_shape { square, rectangle                } ;
        enum class position    { top, left, right, bottom, center } ;
    
        using any_shape = std::variant<shape, other_shape, std::string>;
        using data = std::vector<double>;
    
        struct result {
            any_shape bla;
            position  pos;
            data      bloe;
        };
    
        static inline std::ostream& operator<<(std::ostream& os, shape const& v) {
            switch(v) {
                case shape::circle:  return os << "circle";
                case shape::ellipse: return os << "ellipse";
            }
            throw std::domain_error("shape");
        }
        static inline std::ostream& operator<<(std::ostream& os, other_shape const& v) {
            switch(v) {
                case other_shape::rectangle: return os << "rectangle";
                case other_shape::square:    return os << "square";
    
            }
            throw std::domain_error("other_shape");
        }
        static inline std::ostream& operator<<(std::ostream& os, position const& v) {
            switch(v) {
                case position::top:    return os << "top";
                case position::left:   return os << "left";
                case position::right:  return os << "right";
                case position::bottom: return os << "bottom";
                case position::center: return os << "center";
    
            }
            throw std::domain_error("position");
        }
    
        template <typename... F> struct overloads : F... {
            overloads(F... f) : F(f)... {}
            using F::operator()...;
        };
    
        static inline std::ostream& operator<<(std::ostream& os, any_shape const& v) {
            std::visit(overloads{
                [&os](shape v)       { os << "shape:" << v;               },
                [&os](other_shape v) { os << "other_shape:" << v;         },
                [&os](auto const& v) { os << "custom:" << std::quoted(v); },
            }, v);
            return os;
        }
    }
    
    BOOST_FUSION_ADAPT_STRUCT(AST::result, bla, pos, bloe)
    
    namespace parser {
        using namespace boost::spirit::x3;
    
        struct shape_t : symbols<AST::shape> {
            shape_t() { add
                ("ellipse", AST::shape::ellipse)
                ("circle", AST::shape::circle)
                ;
            }
        } shape_sym;
    
        struct other_shape_t : symbols<AST::other_shape> {
            other_shape_t() { add
                ("square", AST::other_shape::square)
                ("rectangle", AST::other_shape::rectangle)
                ;
            }
        } other_shape_sym;
    
        struct position_t : symbols<AST::position> {
            position_t() { add
                ("top", AST::position::top)
                ("left", AST::position::left)
                ("right", AST::position::right)
                ("bottom", AST::position::bottom)
                ("center", AST::position::center)
                ;
            }
        } position_sym;
    
        // hack for BOL state
        struct state_t {
            bool at_bol = true;
    
            struct set_bol {
                template <typename Ctx> void operator()(Ctx& ctx) const {
                    auto& s = get<state_t>(ctx);
                    //std::clog << std::boolalpha << "set_bol (from " << s.at_bol << ")" << std::endl;
                    s.at_bol = true;
                }
            };
    
            struct reset_bol {
                template <typename Ctx> void operator()(Ctx& ctx) const {
                    auto& s = get<state_t>(ctx);
                    //std::clog << std::boolalpha << "reset_bol (from " << s.at_bol << ")" << std::endl;
                    s.at_bol = false;
                }
            };
    
            struct is_at_bol {
                template <typename Ctx> void operator()(Ctx& ctx) const {
                    auto& s = get<state_t>(ctx);
                    //std::clog << std::boolalpha << "is_at_bol (" << s.at_bol << ")" << std::endl;
                    _pass(ctx) = s.at_bol;
                }
            };
        };
        auto const SET_BOL   = eps[ state_t::set_bol{} ];
        auto const RESET_BOL = eps[ state_t::reset_bol{} ];
        auto const AT_BOL    = eps[ state_t::is_at_bol{} ];
    
        ////////////////
        // helpers - attribute coercion
        template <typename T>
        auto as  = [](auto p) {
            return rule<struct _, T, true> {typeid(T).name()} = p;
        };
        template <typename T>
        auto opt = [](auto p, T defval = {}) {
            return as<T>(p >> RESET_BOL | attr(defval));
        };
    
        // keyword boundary detection
        auto identchar = alnum | char_("-_.");
        auto kw  = [](auto p) { return lexeme[p >> !identchar]; };
        auto ikw = [](auto p) { return no_case[kw(p)]; };
    
        auto const EOL = eol|eoi;
        ////////////////
    
        auto const data = as<AST::data>(double_ % ',');
        auto const position = kw("at") >> position_sym;
    
        auto const custom_shape =
                !(position|data) >> as<std::string>(kw(+identchar));
        auto const any_shape = as<AST::any_shape>(
                ikw(shape_sym) | ikw(other_shape_sym) | custom_shape);
    
        auto const shape_line = as<AST::result>(
                with<state_t>(state_t{}) [
                    SET_BOL >>
                    opt<AST::any_shape>(any_shape) >>
                    opt<AST::position>(position) >>
                    (AT_BOL|','|&EOL) >> -data
                ]);
        auto const shapes = skip(blank) [ shape_line % eol ]/* >> eoi*/;
    }
    
    int main() {
        for (std::string const input : {
                "circle at center, 1, 2, 3",
                "at top, 1, 2, 3",
                "at bottom",
                "1, 2, 3",
                "my_fancy_shape at right, 1",
                R"(circle at center, 1, 2, 3
                   at top, 1, 2, 3
                   at bottom
                   circle, 1, 2, 3
                   1, 2, 3
                   my_fancy_shape at right, 1)",
    
                // invalids:
                "circle, 1, 2 3",
                ", 1, 2, 3",
                "circle 1, 2, 3",
                })
        {
            std::cout << " ==== " << std::quoted(input) << std::endl;
            std::vector<AST::result> r;
            auto f = begin(input), l = end(input);
            if (parse(f, l, parser::shapes, r)) {
                std::cout << "Parsed " << r.size() << " shapes" << std::endl;
                for (auto const& s : r) {
                    std::cout << s.bla << " at " << s.pos;
                    for (auto v : s.bloe)
                        std::cout << ", " << v;
                    std::cout << std::endl;
                }
            } else {
                std::cout << "Parse failed" << std::endl;
            }
    
            if (f!=l) {
                std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << std::endl;
            }
        }
    }
    

    ¹ Wandbox has a more recent version of Boost than Coliru, making with<> directive states mutable as intended.