Search code examples
c++boostboost-spirit-qi

Using too many alternative operator in boost::qi caused segmentation fault


I generated a simple code using qi::spirit:

#include <boost/spirit/include/qi.hpp>
#include <string>

using namespace std;
using namespace boost::spirit;
int main() {
    string str = "string";
    auto begin = str.begin();
    auto symbols = (qi::lit(";") | qi::lit("(") | qi::lit(")") | qi::lit("+") |
                    qi::lit("/") | qi::lit("-") | qi::lit("*"));
    qi::parse(begin, str.end(), *(qi::char_ - symbols));
}

And then this program was terminated by SEGV.Then, my rewritten code with less alternative operators in rhs of symbols,

#include <boost/spirit/include/qi.hpp>
#include <string>
using namespace std;
using namespace boost::spirit;
int main()
{
    string str = "string";
    auto begin = str.begin();
    auto symbols = (qi::lit(";") | qi::lit("+") | qi::lit("/") | qi::lit("-") |
            qi::lit("*"));
    qi::parse(begin, str.end(), *(qi::char_ - symbols));
}

now works well. What's the difference between 2 cases?


Solution

  • Your problem is a classic mistake: using auto to store Qi parser expressions: Assigning parsers to auto variables

    That leads to UB.

    • Use a rule, or qi::copy (which is proto::deep_copy under the hooed).

      auto symbols = qi::copy(qi::lit(";") | qi::lit("(") | qi::lit(")") | qi::lit("+") |
           qi::lit("/") | qi::lit("-") | qi::lit("*"));
      
    • Even better, use a character set to match all the characters at once,

      auto symbols = qi::copy(qi::omit(qi::char_(";()+/*-")));
      

      The omit[] counteracts the fact that char_ exposes it's attribute (where lit doesn't). But since all you ever you it for is to SUBTRACT from another character-set:

      qi::char_ - symbols
      

      You could just as well just write

      qi::char_ - qi::char_(";()+/*-")
      

      Now. You might not know, but you can use ~charset to negate it, so it would just become

      ~qi::char_(";()+/*-")
      

      NOTE - can have special meaning in charsets, which is why I very subtly move it to the end. See docs

    Live Demo

    Extending a little and showing some subtler patterns:

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    #include <string>
    
    using namespace std;
    using namespace boost::spirit;
    int main() {
        string const str = "string;some(thing) else + http://me@host:*/path-element.php";
        
        auto cs = ";()+/*-";
        using qi::char_;
    
        {
            std::vector<std::string> tokens;
            qi::parse(str.begin(), str.end(), +~char_(cs) % +char_(cs), tokens);
    
            std::cout << "Condensing: ";
            for (auto& tok : tokens) {
                std::cout << " " << std::quoted(tok);
            }
            std::cout << std::endl;
        }
    
        {
            std::vector<std::string> tokens;
            qi::parse(str.begin(), str.end(), *~char_(cs) % char_(cs), tokens);
    
            std::cout << "Not condensing: ";
            for (auto& tok : tokens) {
                std::cout << " " << std::quoted(tok);
            }
            std::cout << std::endl;
        }
    }
    

    Prints

    Condensing:  "string" "some" "thing" " else " " http:" "me@host:" "path" "element.php"
    Not condensing:  "string" "some" "thing" " else " " http:" "" "me@host:" "" "path" "element.php"
    

    X3

    If you have c++14, you can use Spirit X3, which doesn't have the "auto problem" (because it doesn't have Proto Expression trees that can get dangling references).

    Your original code would have been fine in X3, and it will compile a lot faster.

    Here's my example using X3:

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    #include <iostream>
    #include <iomanip>
    #include <string>
    
    namespace x3 = boost::spirit::x3;
    int main() {
        std::string const str = "string;some(thing) else + http://me@host:*/path-element.php";
        auto const cs = x3::char_(";()+/*-");
    
        std::vector<std::string> tokens;
        x3::parse(str.begin(), str.end(), +~cs % +cs, tokens);
        //x3::parse(str.begin(), str.end(), *~cs % cs, tokens);
    
        for (auto& tok : tokens) {
            std::cout << " " << std::quoted(tok);
        }
    }
    

    Printing

     "string" "some" "thing" " else " " http:" "me@host:" "path" "element.php"