Search code examples
c++boost-spiritboost-spirit-qi

Boost::spirit::qi fails to compile for a string matcher


I'm trying to write a parser using boost::spirit::qi which will parse everything between a pair of " as-is, and allowing escaping of " characters. I.E., "ab\n\"" should return ab\n\". I've tried with the following code (godbolt link):

#include <boost/spirit/include/qi.hpp>
#include <string>

namespace qi = boost::spirit::qi;

int main() {
    std::string input{R"("ab\n\"")"};
    std::cout << "[" << input << "]\n";

    std::string output;

    using Skipper = qi::rule<std::string::const_iterator>;
    Skipper skip = qi::space;
    qi::rule<std::string::const_iterator, std::string(), Skipper> qstring;

    qstring %= qi::lit("\"") 
        > ( *( (qi::print - qi::lit('"') - qi::lit("\\")) | (qi::char_("\\") > qi::print) ) )
                                                            //   ^^^^^
        > qi::lit("\"");

    auto success = qi::phrase_parse(input.cbegin(), input.cend(), qstring, skip, output);

     if (!success) {
        std::cout << "Failed to parse";
        return 1;
    }
    
    std::cout << "output = [" << output << "]\n";

    return 0;
}

This fails to compile based on some template errors,

/opt/compiler-explorer/libs/boost_1_81_0/boost/spirit/home/support/container.hpp:130:12: error: 'char' is not a class, struct, or union type
  130 |     struct container_value
      |            ^~~~~~~~~~~~~~~
.....
/opt/compiler-explorer/libs/boost_1_81_0/boost/spirit/home/qi/detail/pass_container.hpp:320:66: error: no type named 'type' in 'struct boost::spirit::traits::container_value<char, void>'
  320 |             typedef typename traits::container_value<Attr>::type value_type;

I can get the code to compile if I change the underlined qi::char_("\\") with qi::lit("\\"), but that doesn't create an attribute for the \ which it matches. I've also found that I can get it to compile if I create a new rule which embodies just the Kleene star, but is there a way to get boost to use the correct types in a single expression?

qi::rule<std::string::const_iterator, std::string(), Skipper> qstring;
qi::rule<std::string::const_iterator, std::string(), Skipper> qstringbody;

qstringbody %= ( *( (qi::print - qi::lit('"') - qi::lit("\\")) | (qi::char_("\\") > qi::print) ) );
qstring %= qi::lit("\"") 
    > qstringbody
    > qi::lit("\"");

Solution

  • qi::char_("\") with qi::lit("\"), but that doesn't create an attribute for the \ which it matches

    This is what you require. Parsing should translate the input representation (syntaxis) into your meaningful representation (semantics). It is possible to have an AST that reflects escapes, of course, but then you would NOT be parsing into a string, but something like

    struct char_or_escape {
          enum { hex_escape, octal_escape, C_style_char_esc, unicode_codepoint_escape, named_unicode_escape } type;
          std::variant<uint32_t, std::string> value;
    };
    using StringAST = std::vector<char_or_escape>;
    

    Presumably, you don't want to keep the raw input (otherwise, qi::raw[] is your friend).

    Applying It

    Here's my simplification

    qi::rule<It, std::string(), Skipper> qstring //
        = '"' > *(qi::print - '"' - "\\" | "\\" > qi::print) > '"';
    

    Side note: It seems to require printables only. I'll remove that assumption in the following. You can, of course, reintroduce character subsets as you require.

    qstring = '"' > *(~qi::char_("\"\\") | '\\' > qi::char_) > '"';
    

    Reordering the branches removes the need to except '\\', while being more expressive about intent:

    qstring = '"' > *('\\' > qi::char_ | ~qi::char_('"')) > '"';
    

    Now, from the example input I gather that you might require a C-style treatment of escapes. May I suggest:

    qi::symbols<char, char> c_esc;
    c_esc.add("\\\\", '\\')                                                            //
        ("\\a", '\a')("\\b", '\b')("\\n", '\n')("\\f", '\f')("\\t", '\t')("\\r", '\r') //
        ("\\v", '\v')("\\0", '\0')("\\e", 0x1b)("\\'", '\'')("\\\"", '"')("\\?", 0x3f);
    
    qstring = '"' > *(c_esc | '\\' >> qi::char_ | ~qi::char_('"')) > '"';
    

    (Note some of these are redundant because they already encode into the secondary input character).

    Demo

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    
    namespace qi = boost::spirit::qi;
    
    int main() {
        using It = std::string::const_iterator;
    
        using Skipper = qi::space_type;
        qi::rule<It, std::string(), Skipper> qstring;
    
        qi::symbols<char, char> c_esc;
        c_esc.add("\\\\", '\\')                                                            //
            ("\\a", '\a')("\\b", '\b')("\\n", '\n')("\\f", '\f')("\\t", '\t')("\\r", '\r') //
            ("\\v", '\v')("\\0", '\0')("\\e", 0x1b)("\\'", '\'')("\\\"", '"')("\\?", 0x3f);
    
        qstring = '"' > *(c_esc | '\\' >> qi::char_ | ~qi::char_('"')) > '"';
    
        for (std::string input :
             {
                 R"("")",
                 R"("ab\n\"")",
                 R"("ab\r\n\'")",
             }) //
        {
            std::string output;
            bool success = phrase_parse(input.cbegin(), input.cend(), qstring, qi::space, output);
    
            if (!success)
                std::cout << quoted(input) << " -> FAILED\n";
            else
                std::cout << quoted(input) << " -> " << quoted(output) << "\n";
        }
    }
    

    Printing

    "\"\"" -> ""
    "\"ab\\n\\\"\"" -> "ab
    \""
    "\"ab\\r\\n\\'\"" -> "ab
    '"
    

    Further Reading

    For more complete escape handling, see here: Creating a boost::spirit::x3 parser for quoted strings with escape sequence handling (also alternative approaches instead of the symbols).

    It contains a list of even more elaborate examples (JSON style unicode escapes etc.)