Search code examples
c++boostpolymorphismboost-spiritboost-spirit-qi

How can I use polymorphic attributes with boost::spirit::qi parsers?


I would like my boost::spirit-based parser to be able to parse a file, convert the parsed rules into different types, and emit a vector containing all of the matches it found. All of the types that are emitted as attributes should be inherited from a base type, for example:

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapt_struct.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/foreach.hpp>

struct CommandBase
{
   virtual void commandAction()
   {
     std::cout << "This is a base command. You should never see this!" << std::endl;
     //Boost::spirit seems to get mad if I make this purely virtual. Clearly I'm doing it wrong.
   }
};

struct CommandTypeA : public CommandBase
{
   int valueA;
   int valueB;
   virtual void commandAction()
   {
      std::cout << "CommandType A! ValueA: " << valueA << " ValueB: " << valueB << std::endl;
   }

};

struct CommandTypeB : public CommandBase
{
   double valueA;
   std::vector<char> valueB;
   virtual void commandAction()
   {
      std::cout << "CommandType B! valueA: " << valueA << " string: " << std::string(valueB.begin(), valueB.end()) << std::endl;
   }
};
struct CommandTypeC : public CommandBase
{
  //Represents a sort of "subroutine" type where multiple commands can be grouped together
  std::vector<char> labelName;
  std::vector<boost::shared_ptr<CommandBase> > commands;
  virtual void commandAction()
  {
      std::cout << "Subroutine: " << std::string(labelName.start(), labelName.end())
                << " has " << commands.size() << " commands:" << std::endl;
      BOOST_FOREACH(boost::shared_ptr<CommandBase> c, commands)
      {
           c->commandAction();
      }          
  }
};

Now, my attempted parser code:

namespace ascii = boost::spirit::ascii;
namespace qi = boost::spirit::qi;
using qi::lit_;

BOOST_FUSION_ADAPT_STRUCT(
   CommandTypeA,
   (int, valueA)
   (int, valueB)
)

BOOST_FUSION_ADAPT_STRUCT(
   CommandTypeB,
   (double, valueA)
   (std::vector<char>, valueB)
)

BOOST_FUSION_ADAPT_STRUCT(
   CommandTypeC,
   (std::vector<char>, labelName)
   (std::vector<boost::shared_ptr<CommandBase> >, commands)
)

template<typename Iterator, typename Skipper = ascii::space_type>
struct CommandParser : qi::grammar<Iterator, std::vector<boost::shared_ptr<CommandBase> >(), Skipper>
{
   public:
   CommandParser() : CommandParser()::base_type(commands)
   {
      CommandARule = qi::int_ >> qi::int_ >> lit("CMD_A");
      CommandBRule = qi::int_ >> +(qi::char_) >> lit("CMD_B");
      CommandCRule = qi::char_(':') >> lexeme[+(qi::char_ - ';' - ascii::space) >> +ascii::space] >> commands >> qi::char_(';');

      commands = +(CommandARule | CommandBRule | CommandCRule);
   }
   protected:
   qi::rule<Iterator, boost::shared_ptr<CommandTypeA>, Skipper> CommandARule;
   qi::rule<Iterator, boost::shared_ptr<CommandTypeB>, Skipper> CommandBRule;
   qi::rule<Iterator, boost::shared_ptr<CommandTypeC>, Skipper> CommandCRule;
   qi::rule<Iterator, std::vector<boost::shared_ptr<CommandBase> >, Skipper> commands;

};


std::vector<boost::shared_ptr<CommandBase> > commandList;
bool success = qi::phrase_parse(StartIterator, EndIterator, CommandParser, ascii::space, commandList);

BOOST_FOREACH(boost::shared_ptr<CommandBase> c, commandList)
{
    c->commandAction();
}

Now, this code definitely won't compile, but I hope it gets the gist across for what I'm attempting to do.

The main hangup is that qi::rules seem to want to emit the actual struct, not a reference to it.

My question is thus:

Is it possible to force qi::rule to emit a polymorphism-compatible reference like I'm attempting (if so, how), and is this the best approach for what I'm attempting to accomplish (namely a list of executable objects representing the parsed commands and their parameters)?


Solution

  • Spirit is a lot friendlier to compiletime-polymorphism

    typedef variant<Command1, Command2, Command3> Command;
    

    But, let's suppose you really want to do the old-fashioned polymorphism thing...

    Just newing-up the polymorphic objects on the fly during parsing, however, is a sure-fire way to

    • make your parser bloated with semantic actions
    • create lot of memory leaks on back-tracking in the grammar rules
    • make parsing awesomely slow (because you have all manner of dynamic allocation going on).
    • Worst of all, none of this would be optimized away, even when you're not actually passing an attribute reference into the top-level parse API. (Usually, all attribute handling "magically" vaporizes at compile-time, which is very useful for input format validation)

    So you'll want to create a holder for objects of your base-command class, or derived. Make the holder satisfy RuleOfZero and get the actual value out by type erasure.

    (Beyond solving the "accidental" complexity and limits w.r.t. memory reclamation, a bonus to this abstraction is that you you can still opt to handle the storage statically, so you save [a lot] of time in heap allocations.)

    I'll look at your sample to see whether I can demonstrate it quickly.

    Here is what I mean with a 'holder' class (add a virtual destructor to CommandBase!):

    struct CommandHolder
    {
        template <typename Command> CommandHolder(Command cmd) 
            : storage(new concrete_store<Command>{ std::move(cmd) }) { }
    
        operator CommandBase&() { return storage->get(); }
      private:
        struct base_store {
            virtual ~base_store() {}; 
            virtual CommandBase& get() = 0;
        };
        template <typename T> struct concrete_store : base_store {
            concrete_store(T v) : wrapped(std::move(v)) { }
            virtual CommandBase& get() { return wrapped; }
          private:
            T wrapped; 
        };
    
        boost::shared_ptr<base_store> storage;
    };
    

    As you can see I opted for unique_ptr for simples ownership semantics here (a variant would avoid some allocation overhead as an optimization later). I couldn't make unique_ptr work with Spirit because Spirit is simply not move-aware. (Spirit X3 will be).

    We can trivially implement a type-erased AnyCommand based on this holder:

    struct AnyCommand : CommandBase
    {
        template <typename Command> AnyCommand(Command cmd) 
            : holder(std::move(cmd)) { }
    
        virtual void commandAction() override { 
            static_cast<CommandBase&>(holder).commandAction();
        }
      private:
        CommandHolder holder;
    };
    

    So now you can "assign" any command to an AnyCommand and use it "polymorphically" through the holder, even though the holder and AnyCommand have perfect value-semantics.

    This sample grammar will do:

    CommandParser() : CommandParser::base_type(commands)
    {
        using namespace qi;
        CommandARule = int_    >> int_           >> "CMD_A";
        CommandBRule = double_ >> lexeme[+(char_ - space)] >> "CMD_B";
        CommandCRule = ':' >> lexeme [+graph - ';'] >> commands >> ';';
    
        command  = CommandARule | CommandBRule | CommandCRule;
        commands = +command;
    }
    

    With the rules defined as:

    qi::rule<Iterator, CommandTypeA(),            Skipper> CommandARule;
    qi::rule<Iterator, CommandTypeB(),            Skipper> CommandBRule;
    qi::rule<Iterator, CommandTypeC(),            Skipper> CommandCRule;
    qi::rule<Iterator, AnyCommand(),              Skipper> command;
    qi::rule<Iterator, std::vector<AnyCommand>(), Skipper> commands;
    

    This is quite a delightful mix of value-semantics and runtime-polymorphism :)

    The test main of

    int main()
    {
        std::string const input =
            ":group             \n"
            "     3.14  π CMD_B \n"
            "     -42  42 CMD_A \n"
            "     -inf -∞ CMD_B \n"
            "     +inf +∞ CMD_B \n"
            ";                  \n"
            "99 0 CMD_A";
    
        auto f(begin(input)), l(end(input));
    
        std::vector<AnyCommand> commandList;
        CommandParser<std::string::const_iterator> p;
        bool success = qi::phrase_parse(f, l, p, qi::space, commandList);
    
        if (success) {
            BOOST_FOREACH(AnyCommand& c, commandList) {
                c.commandAction();
            }
        } else {
            std::cout << "Parsing failed\n";
        }
    
        if (f!=l) {
            std::cout << "Remaining unparsed input '" << std::string(f,l) << "'\n";
        }
    }
    

    Prints:

    Subroutine: group has 4 commands:
    CommandType B! valueA: 3.14 string: π
    CommandType A! ValueA: -42 ValueB: 42
    CommandType B! valueA: -inf string: -∞
    CommandType B! valueA: inf string: +∞
    CommandType A! ValueA: 99 ValueB: 0
    

    See it all Live On Coliru