Search code examples
c++parsingboost-spiritebnfpuredata

How can I reserve a set of keywords in a name field in boost spirit?


I have the following definition for an object record in PureData that I need to be able to parse into my generic PdObject struct:

Description:
Defines an object
Syntax:
#X obj [x_pos] [y_pos] [object_name] [p1] [p2] [p3] [...];\r\n
Parameters:
[x_pos] - horizontal position within the window
[y_pos] - vertical position within the window
[object_name] - name of the object (optional)
[p1] [p2] [p3] [...] the parameters of the object (optional)
Example:
#X obj 55 50;
#X obj 132 72 trigger bang float;

And I have created the following boost spirit rule that has been tested to work:

template <typename Iterator> struct PdObjectGrammar : qi::grammar<Iterator, PdObject()> { 
    PdObjectGrammar() : PdObjectGrammar::base_type(start) { 
        using namespace qi; 
        start = skip(space)[objectRule]; 
        pdStringRule = +(('\\'  >> space) | (graph-lit(";"))); 
        objectRule = "#X obj" >> int_ >> int_ >> -(pdStringRule) >> *(pdStringRule) >> ";"; 
        BOOST_SPIRIT_DEBUG_NODES((start)(objectRule)(pdStringRule))
    }
    private: 
    qi::rule<Iterator, std::string()> pdStringRule; 
    qi::rule<Iterator, PdObject()> start; 
    qi::rule<Iterator, PdObject(), qi::space_type> objectRule; 

};

However, there are also special "reserved names" that cannot be used, such as "bng," "tgl," "nbx," etc...

For example, here is another type of "obj" using a reserved name keyword that must be parsed separately by a different rule:

#X obj 92 146 bng 20 250 50 0 empty empty empty 0 -10 0 12 #fcfcfc #000000 #000000;

How can I modify my previous qi rule to not parse the above string, and leave it for another grammar to check (which would parse it to a different struct)?

Postscript:

My full test for the PdObjectGrammar is:

#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

#include <string> 
#include <vector>
#include <fstream>


namespace qi = boost::spirit::qi;

struct PdObject {
int xPos;
int yPos;
std::string name;
std::vector<std::string> params;

};


BOOST_FUSION_ADAPT_STRUCT(
    PdObject,
    xPos,
    yPos,
    name,
    params
)

template <typename Iterator> struct PdObjectGrammar : qi::grammar<Iterator, PdObject()> { 
    PdObjectGrammar() : PdObjectGrammar::base_type(start) { 
        using namespace qi; 
        start = skip(space)[objectRule]; 
        pdStringRule = +(('\\'  >> space) | (graph-lit(";"))); 
        objectRule = "#X obj" >> int_ >> int_ >> -(pdStringRule) >> *(pdStringRule) >> ";"; 
        BOOST_SPIRIT_DEBUG_NODES((start)(objectRule)(pdStringRule))
    }
    private: 
    qi::rule<Iterator, std::string()> pdStringRule; 
    qi::rule<Iterator, PdObject()> start; 
    qi::rule<Iterator, PdObject(), qi::space_type> objectRule; 

};


int main(int argc, char** argv)
{
  if(argc != 2)
    {
        std::cout << "Usage: "  <<argv[0] << " <PatchFile>" << std::endl;
        exit(1); 
    }

    std::ifstream inputFile(argv[1]); 
    std::string inputString(std::istreambuf_iterator<char>(inputFile), {}); 

    PdObject msg;
    PdObjectGrammar<std::string::iterator> parser; 

    bool success = qi::phrase_parse(inputString.begin(), inputString.end(), parser, boost::spirit::ascii::space, msg); 
    std::cout << "Success: " << success << std::endl;

    return 0; 

}

Solution

  • In a way "keywordness" is not part of the grammar. It's a semantic check.

    There's not a standard way in which grammars deal with keywords. For example C++ has a number of identifiers that are contextually reserved only.

    The short story of it is you will just have to express your constraints in code or validate semantics after-the-fact (on the parsed result).

    Naively: Live

    string     = +('\\' >> qi::space | qi::graph - ";");
    name       = string - "bng" - "tgl" - "nbx" - "vsl" - "hsl" - "vradio" - "hradio" - "vu" - "cnv";
    object     = "#X obj"       //
        >> qi::int_ >> qi::int_ //
        >> -name                //
        >> *string >> ";";
    

    Or Live

    string     = +('\\' >> qi::space | qi::graph - ";");
    builtin    = qi::lit("bng") | "tgl" | "nbx" | "vsl" | "hsl" | "vradio" | "hradio" - "vu" | "cnv";
    object     = "#X obj"        //
        >> qi::int_ >> qi::int_  //
        >> -(!builtin >> string) //
        >> *string >> ";";
    

    Symbols

    You can make this a bit more elegant, maintainable and possibly more efficient by defining a symbol for it: Live

    qi::symbols<char> builtin;
    
    
    // ...
    builtin += "bng", "tgl", "nbx", "vsl", "hsl", "vradio", "hradio", "vu", "cnv";
    
    string = +('\\' >> qi::space | qi::graph - ";");
    object = "#X obj"                //
             >> qi::int_ >> qi::int_ //
             >> -(string - builtin)    //
             >> *string >> ";";
    

    Distinct Keywords

    There's a flaw. When the user names their object something starting with the builtin list, like bngalore or vslander the builtins will match so the name would be rejected: Live

    To account for this, make sure we're on a lexeme boundary: Live

    auto kw = [](auto const& p) { return qi::copy(qi::lexeme[p >> !(qi::graph - ';')]); };
    string = +('\\' >> qi::space | qi::graph - ";");
    object = "#X obj"                //
        >> qi::int_ >> qi::int_      //
        >> -(!kw(builtin) >> string) //
        >> *string >> ";";
    

    It doesn't work!

    That's because the grammar is flawed. In your defense, the specification is extremely sloppy. It's one of those grammars alright.

    With all those things being optional, you should ask yourself, how does the parser know that name is omitted, when there are parameters? As far as I can see the parser could never tell, so when the name is omitted, there cannot be parameters?

    We can express that: Live

    string = +('\\' >> qi::space | qi::graph - ";");
    object = "#X obj"                                   //
        >> qi::int_ >> qi::int_                         //
        >> !kw(builtin) >> -(string >> *string) >> ";"; //
    

    Oh noes, now the entire (string >> *string) is compatible with just the name attribute...:

    Input: "#X obj 132 72 trigger bang float;"
     -> (132 72 "triggerbangfloat" { })
    

    Here I'd advise to adjust the AST to reflect the parsed grammar:

    struct GenericObject {
        String              name;
        std::vector<String> params;
    };
    
    struct PdObject {
        int           xPos, yPos;
        GenericObject generic;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(PdObject, xPos, yPos, generic)
    BOOST_FUSION_ADAPT_STRUCT(GenericObject, name, params)
    

    Now, it does propagate the attributes correctly: Live, note the extra sub-object (()) in the output:

    Input: "#X obj 132 72 trigger bang float;"
     -> (132 72 ("trigger" { "bang" "float" }))
    

    Taking It All The Way

    As a pro tip, don't implement the parser in the same sloppy fashion as the specification was done. Likely, you just want to parse different object types with dedicated AST types and ditto rules.

    For really advanced/pluggable grammars, you might dispatch the rules based on the name symbol. That's known as the Nabialek Trick.

    Let's generalize our object rule:

    object = "#X obj"           //
        >> qi::int_ >> qi::int_ //
        >> definition           //
        >> ";"                  //
        ;
    

    Now let's demo the VSL rule, in addition to generic objects:

    definition = vslider | generic;
    

    Generic is still what we had before:

    generic           //
        = opt(string) // name
        >> *string;   // params
    

    Let's do a rough take on Vslider:

    vslider                             //
        = qi::lexeme["vsl" >> boundary] //
        >> opt(qi::uint_)               // width
        >> opt(qi::uint_)               // height
        >> opt(qi::double_)             // bottom
        >> opt(qi::double_)             // top
        >> opt(bool_)                   // log
        >> opt(bool_)                   // init
        >> opt(string)                  // send
        >> opt(string)                  // receive
        >> opt(string)                  // label
        >> opt(qi::int_)                // x_off
        >> opt(qi::int_)                // y_off
        >> opt(string)                  // font
        >> opt(qi::uint_)               // fontsize
        >> opt(rgb)                     // bg_color
        >> opt(rgb)                     // fg_colo
        >> opt(rgb)                     // label_color
        >> opt(qi::double_)             // default_value
        >> opt(bool_)                   // steady_on_click
        ;
    

    Of course we need a few helpers:

    qi::uint_parser<int32_t, 16, 6, 6> hex6{};
    rgb = ('#' >> hex6) | qi::int_;
    
    auto boundary = qi::copy(!(qi::graph - ';'));
    auto opt = [](auto const& p) { return qi::copy(p | &qi::lit(';')); };
    
    bool_ = qi::bool_ | qi::uint_parser<bool, 2, 1, 1>{};
    

    And the AST types:

    struct RGB {
        int32_t rgb;
    };
    
    namespace Defs {
        using boost::optional;
    
        struct Generic {
            String              name;
            std::vector<String> params;
        };
    
        struct Vslider {
            optional<unsigned> width;           // horizontal size of gui element
            optional<unsigned> height;          // vertical size of gui element
            optional<double>   bottom;          // minimum value
            optional<double>   top;             // maximum value
            bool               log = false;     // when set the slider range is outputted
                                                // logarithmically, otherwise it's output
                                                // is linair
            String           init;              // sends default value on patch load
            String           send;              // send symbol name
            String           receive;           // receive symbol name
            optional<String> label;             // label
            int              x_off = 0;         // horizontal position of the label
                                                // text relative to the upperleft
                                                // corner of the object
            int y_off = 0;                      // vertical position of the label
                                                // text relative to the upperleft
                                                // corner of the object
            optional<String>   font;            // font type
            optional<unsigned> fontsize;        // font size
            optional<RGB>      bg_color;        // background color
            optional<RGB>      fg_color;        // foreground color
            optional<RGB>      label_color;     // label color
            optional<double>   default_value;   // default value times hundred
            optional<bool>     steady_on_click; // when set, fader is steady on click,
                                                // otherwise it jumps on click
        };
    
        using Definition = boost::variant<Vslider, Generic>;
    } // namespace Defs
    
    using Defs::Definition;
    
    struct PdObject {
        int        xPos, yPos;
        Definition definition;
    };
    

    Putting it all together:

    Full Demo

    Live On Coliru

    // #define BOOST_SPIRIT_DEBUG
    #include <boost/core/demangle.hpp>
    #include <boost/fusion/adapted.hpp>
    #include <boost/fusion/include/io.hpp>
    #include <boost/optional/optional_io.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    
    namespace Ast {
        // C++ makes it hard to pretty-print containers...
        struct print_hack : std::char_traits<char> {};
        using String = std::basic_string<char, print_hack>;
        static inline std::ostream& operator<<(std::ostream& os, String const& s) { return os << quoted(s); }
        static inline std::ostream& operator<<(std::ostream& os, std::vector<String> const& ss) {
            os << "{";
            for (auto& s : ss) os << " " << s;
            return os << " }";
        }
    
        struct RGB {
            int32_t rgb;
        };
    
        namespace Defs {
            using boost::optional;
    
            struct Generic {
                String              name;
                std::vector<String> params;
            };
    
            struct Vslider {
                optional<unsigned> width;           // horizontal size of gui element
                optional<unsigned> height;          // vertical size of gui element
                optional<double>   bottom;          // minimum value
                optional<double>   top;             // maximum value
                bool               log = false;     // when set the slider range is outputted
                                                    // logarithmically, otherwise it's output
                                                    // is linair
                String           init;              // sends default value on patch load
                String           send;              // send symbol name
                String           receive;           // receive symbol name
                optional<String> label;             // label
                int              x_off = 0;         // horizontal position of the label
                                                    // text relative to the upperleft
                                                    // corner of the object
                int y_off = 0;                      // vertical position of the label
                                                    // text relative to the upperleft
                                                    // corner of the object
                optional<String>   font;            // font type
                optional<unsigned> fontsize;        // font size
                optional<RGB>      bg_color;        // background color
                optional<RGB>      fg_color;        // foreground color
                optional<RGB>      label_color;     // label color
                optional<double>   default_value;   // default value times hundred
                optional<bool>     steady_on_click; // when set, fader is steady on click,
                                                    // otherwise it jumps on click
            };
    
            using Definition = boost::variant<Generic, Vslider>;
    
            using boost::fusion::operator<<;
        } // namespace Defs
    
        using Defs::Definition;
    
        struct PdObject {
            int        xPos, yPos;
            Definition definition;
        };
    
        using boost::fusion::operator<<;
    }
    
    BOOST_FUSION_ADAPT_STRUCT(Ast::Defs::Vslider, width, height, bottom, top, log, init, send, receive, label,
                              x_off, y_off, font, fontsize, bg_color, fg_color, label_color, default_value,
                              steady_on_click)
    BOOST_FUSION_ADAPT_STRUCT(Ast::Defs::Generic, name, params)
    BOOST_FUSION_ADAPT_STRUCT(Ast::RGB, rgb)
    BOOST_FUSION_ADAPT_STRUCT(Ast::PdObject, xPos, yPos, definition)
    
    namespace qi = boost::spirit::qi;
    
    template <typename Iterator> struct PdObjectGrammar : qi::grammar<Iterator, Ast::PdObject()> {
        PdObjectGrammar() : PdObjectGrammar::base_type(start) {
            start = qi::skip(qi::blank)[ object ];
    
            /* #X obj [x_pos] [y_pos] [object_name] [p1] [p2] [p3] [...];\r\n
             * Parameters:
             *  [x_pos] - horizontal position within the window
             *  [y_pos] - vertical position within the window
             *  [object_name] - name of the object (optional)
             *  [p1] [p2] [p3] [...] the parameters of the object (optional)
             */
            qi::uint_parser<int32_t, 16, 6, 6> hex6{};
            rgb = ('#' >> hex6) | qi::int_;
    
            auto boundary = qi::copy(!(qi::graph - ';'));
            auto opt = [](auto const& p) { return qi::copy(p | &qi::lit(';')); };
    
            bool_ = qi::bool_ | qi::uint_parser<bool, 2, 1, 1>{};
    
            vslider                             //
                = qi::lexeme["vsl" >> boundary] //
                >> opt(qi::uint_)               // width
                >> opt(qi::uint_)               // height
                >> opt(qi::double_)             // bottom
                >> opt(qi::double_)             // top
                >> opt(bool_)                   // log
                >> opt(bool_)                   // init
                >> opt(string)                  // send
                >> opt(string)                  // receive
                >> opt(string)                  // label
                >> opt(qi::int_)                // x_off
                >> opt(qi::int_)                // y_off
                >> opt(string)                  // font
                >> opt(qi::uint_)               // fontsize
                >> opt(rgb)                     // bg_color
                >> opt(rgb)                     // fg_colo
                >> opt(rgb)                     // label_color
                >> opt(qi::double_)             // default_value
                >> opt(bool_)                   // steady_on_click
                ;
    
            generic           //
                = opt(string) // name
                >> *string;   // params
    
            definition = vslider | generic;
    
            string = +('\\' >> qi::space | qi::graph - ";");
            object = "#X obj"           //
                >> qi::int_ >> qi::int_ //
                >> definition           //
                >> ";"                  //
                ;
    
            BOOST_SPIRIT_DEBUG_NODES(          //
                (start)(object)(string)(rgb)   //
                (definition)(vslider)(generic) //
                (bool_))                       //
        }
    
      private:
        using Skipper = qi::blank_type;
        qi::rule<Iterator, Ast::PdObject(),         Skipper> object;
        qi::rule<Iterator, Ast::Defs::Vslider(),    Skipper> vslider;
        qi::rule<Iterator, Ast::Defs::Generic(),    Skipper> generic;
        qi::rule<Iterator, Ast::Defs::Definition(), Skipper> definition;
    
        // lexemes
        qi::rule<Iterator, bool()>          bool_;
        qi::rule<Iterator, Ast::RGB()>      rgb;
        qi::rule<Iterator, Ast::String()>   string;
        qi::rule<Iterator, Ast::PdObject()> start;
    };
    
    int main()
    {
        PdObjectGrammar<std::string::const_iterator> const parser;
    
        for (std::string const input :
             {
                 "#X obj 55 50;",
                 "#X obj 92 146 bng 20 250 50 0 empty empty empty 0 -10 0 12 #fcfcfc #000000 #000000;",
                 "#X obj 50 38 vsl 15 128 0 127 0 0 empty empty empty 0 -8 0 8 -262144 -1 -1 0 1;",
             }) //
        {
            Ast::PdObject msg;
    
            auto f = input.begin(), l = input.end();
            std::cout << "Input: " << quoted(input) << std::endl;
            if (qi::parse(f, l, parser, msg)) {
                std::cout << " -> " << boost::core::demangle(msg.definition.type().name()) << std::endl;
                std::cout << " -> " << msg << std::endl;
            } else
                std::cout << " -> FAILED" << std::endl;
    
            if (f != l)
                std::cout << " Remaining: " << quoted(std::string(f, l)) << std::endl;
        }
    }
    

    Prints

    Input: "#X obj 55 50;"
     -> Ast::Defs::Generic
     -> (55 50 ("" { }))
    Input: "#X obj 92 146 bng 20 250 50 0 empty empty empty 0 -10 0 12 #fcfcfc #000000 #000000;"
     -> Ast::Defs::Generic
     -> (92 146 ("bng" { "20" "250" "50" "0" "empty" "empty" "empty" "0" "-10" "0" "12" "#fcfcfc" "#000000" "#000000" }))
    Input: "#X obj 50 38 vsl 15 128 0 127 0 0 empty empty empty 0 -8 0 8 -262144 -1 -1 0 1;"
     -> Ast::Defs::Vslider
     -> (50 38 ( 15  128  0  127 0 "" "empty" "empty"  "empty" 0 -8  "0"  8  (-262144)  (-1)  (-1)  0  1))
    

    Note how we parse bng as Generic by default, simply because we didn't add a definition rule for it yet. Adding it: Live:

    Input: "#X obj 55 50;"
     -> Ast::Defs::Generic
     -> (55 50 ("" { }))
    Input: "#X obj 92 146 bng 20 250 50 0 empty empty empty 0 -10 0 12 #fcfcfc #000000 #000000;"
     -> Ast::Defs::Bang
     -> (92 146 ( 20  250  2 "" "empty" "empty" "empty"  0  -10  "0"  12  (16579836)  (0)  (0)))
    Input: "#X obj 50 38 vsl 15 128 0 127 0 0 empty empty empty 0 -8 0 8 -262144 -1 -1 0 1;"
     -> Ast::Defs::Vslider
     -> (50 38 ( 15  128  0  127 0 "" "empty" "empty"  "empty" 0 -8  "0"  8  (-262144)  (-1)  (-1)  0  1))
    

    That was basically 1:1 copy-paste from the PureData grammar docs.

    Of course, my fingers itch to remove the duplication of init, send, receive, label, x_off, y_off, font, fontsize, bg_color, fg_color and label_color... But I'll leave it as an exorcism for the reader.