Search code examples
boostboost-xpressive

How to use boost::xpressive for populating a vector with structs within a semantic action


I was trying to insert a data struct into a vector everytime a match is detected, but i am failing even in compiling. The code is next:

#include <string>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>

using namespace boost::xpressive;

struct Data
{
    int integer;
    double real;
    std::string str;

    Data(const int _integer, const double _real, const std::string& _str) : integer(_integer), real(_real), str(_str) { }
};

int main()
{
    std::vector<Data> container;

    std::string input = "Int: 0 - Real: 18.8 - Str: ABC-1005\nInt: 0 - Real: 21.3 - Str: BCD-1006\n";

    sregex parser = ("Int: " >> (s1 = _d) >> " - Real: " >> (s2 = (repeat<1,2>(_d) >> '.' >> _d)) >> " - Str: " >> (s3 = +set[alnum | '-']) >> _n)
                    [::ref(container)->*push_back(Data(as<int>(s1), as<double>(s2), s3))];

    sregex_iterator cur(input.begin(), input.end(), parser);
    sregex_iterator end;

    for(; cur != end; ++cur)
        smatch const &what = *cur;

    return 0;
}

It is failing in compile the "push_back" semantic action due to I am using a Data object inside and it is not able to use it lazinessly (I guess, I am not really sure).

Please, could anyone help me with this?

Note- I am unluckily tied to MS VS 2010 (not fully c++11 compliant), so please don't use variadic templates and emplace_back solutions. Thank you.


Solution

  • Using Xpressive

    You should make the action a lazy actor. Your Data constructor call isn't.

    Live On Coliru

    #include <string>
    #include <boost/xpressive/xpressive.hpp>
    #include <boost/xpressive/regex_actions.hpp>
    
    namespace bex = boost::xpressive;
    
    struct Data {
        int integer;
        double real;
        std::string str;
    
        Data(int integer, double real, std::string str) : integer(integer), real(real), str(str) { }
    };
    
    #include <iostream>
    
    int main() {
        std::vector<Data> container;
    
        std::string const& input = "Int: 0 - Real: 18.8 - Str: ABC-1005\nInt: 0 - Real: 21.3 - Str: BCD-1006\n";
    
        using namespace bex;
        bex::sregex const parser = ("Int: " >> (s1 = _d) >> " - Real: " >> (s2 = (repeat<1,2>(_d) >> '.' >> _d)) >> " - Str: " >> (s3 = +set[alnum | '-']) >> _n)
            [bex::ref(container)->*bex::push_back(bex::construct<Data>(as<int>(s1), as<double>(s2), s3))];
    
        bex::sregex_iterator cur(input.begin(), input.end(), parser), end;
    
        for (auto const& what : boost::make_iterator_range(cur, end)) {
            std::cout << what.str() << "\n";
        }
    
        for(auto& r : container) {
            std::cout << "[ " << r.integer << "; " << r.real << "; " << r.str << " ]\n";
        }
    }
    

    Prints

    Int: 0 - Real: 18.8 - Str: ABC-1005
    
    Int: 0 - Real: 21.3 - Str: BCD-1006
    
    [ 0; 18.8; ABC-1005 ]
    [ 0; 21.3; BCD-1006 ]
    

    Using Spirit

    I'd use spirit for this. Spirit has the primitives to directly parse to underlying data types, which is less error prone and more efficient.

    Spirit Qi (V2)

    Using Phoenix, it's pretty similar: Live On Coliru

    Using Fusion adaptation, it gets more interesting, and a lot simpler:

    Live On Coliru

    Now imagine:

    • You wanted to match the keywords case insensitive
    • You wanted to make whitespace insignificant
    • You wanted to accept empty lines, but not random data in between

    How would you do that in Xpressive? Here's how you'd do it with Spirit. Note how the additional constraints do not change the grammar, essentially. Contrast that with regex-based parsers.

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <boost/fusion/adapted/struct.hpp>
    namespace qi = boost::spirit::qi;
    
    struct Data {
        int integer;
        double real;
        std::string str;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(Data, integer, real, str);
    
    #include <iostream>
    
    int main() {
        std::vector<Data> container;
        using It = std::string::const_iterator;
    
        std::string const& input = "iNT: 0 - Real: 18.8 - Str: ABC-1005\n\nInt: 1-Real:21.3 -sTR:BCD-1006\n\n";
    
        qi::rule<It, Data(), qi::blank_type> parser = qi::no_case[
                qi::lit("int") >> ':' >> qi::auto_ >> '-' 
                >> "real" >> ':' >> qi::auto_ >> '-' 
                >> "str" >> ':' >> +(qi::alnum|qi::char_('-')) >> +qi::eol
            ];
    
        It f = input.begin(), l = input.end();
        if (parse(f, l, qi::skip(qi::blank)[*parser], container)) {
            std::cout << "Parsed:\n";
            for(auto& r : container) {
                std::cout << "[ " << r.integer << "; " << r.real << "; " << r.str << " ]\n";
            }
        } else {
            std::cout << "Parse failed\n";
        }
    
        if (f != l) {
            std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
        }
    }
    

    Still prints

    Parsed:
    [ 0; 18.8; ABC-1005 ]
    [ 1; 21.3; BCD-1006 ]
    

    Further thoughts: how would you

    • Parse scientific notation? Negative numbers?
    • Parse decimal numbers correctly (assuming you are really parsing financial amounts, you may not wish inexact floating point representations)

    Spirit X3

    If you can use c++14, Spirit X3 can be more efficient, and compile a lot faster than either the Spirit Qi or the Xpressive approach:

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    #include <boost/fusion/adapted/struct.hpp>
    
    struct Data {
        int integer;
        double real;
        std::string str;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(Data, integer, real, str);
    
    namespace Parsers {
        using namespace boost::spirit::x3;
    
        static auto const data 
            = rule<struct Data_, ::Data> {} 
            = no_case[
                lit("int") >> ':' >> int_ >> '-' 
                >> "real" >> ':' >> double_ >> '-' 
                >> "str" >> ':' >> +(alnum|char_('-')) >> +eol
            ];
    
        static auto const datas = skip(blank)[*data];
    }
    
    #include <iostream>
    
    int main() {
        std::vector<Data> container;
    
        std::string const& input = "iNT: 0 - Real: 18.8 - Str: ABC-1005\n\nInt: 1-Real:21.3 -sTR:BCD-1006\n\n";
    
        auto f = input.begin(), l = input.end();
        if (parse(f, l, Parsers::datas, container)) {
            std::cout << "Parsed:\n";
            for(auto& r : container) {
                std::cout << "[ " << r.integer << "; " << r.real << "; " << r.str << " ]\n";
            }
        } else {
            std::cout << "Parse failed\n";
        }
    
        if (f != l) {
            std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
        }
    }
    

    Prints (it's getting boring):

    Parsed:
    [ 0; 18.8; ABC-1005 ]
    [ 1; 21.3; BCD-1006 ]