Search code examples
c++boostboost-spirit

boost spirit X3 parser which produces offsets into the original string


I am trying to write a boost::spirit::x3 parser which, rather than producing the sub-strings (for instance), instead produces offsets and lengths of the matches strings in the source.

I have tried various combinations of on_success handlers, semantic actions, and nothing has really worked.

given:

ABC\n
DEFG\n
HI\n

I'd like a parser which produced a std::vector<boost::tuple<size_t, size_t>> containing:

0,3
4,4
9,2

where clearly it gets more complicated as we match specific substrings on each line, rather than just taking the whole thing.

Is this possible?


Solution

  • Here's a quick draft.

    I've replaced tuple<p, len> with a POD struct because the interaction between x3::raw[] and fusion/adapted/std_tuple.hpp is such that you need to specialize traits::move_to anyways.

    In such cases I hugely prefer a user-defined custom type to specialize on, rather than special casing some generic standard library types that could collide with other uses elsewhere.

    So, let the struct be

    using It = char const*;
    struct Range {
       It data;
       size_t size;
    };
    

    Then, to parse the following sample input:

    char const input[] = "{ 123, 234, 345 }\n{ 456, 567, 678 }\n{ 789, 900, 1011 }";
    

    We need nothing more than a simple grammar:

    x3::raw ['{' >> (x3::int_ % ',') >> '}'] % x3::eol
    

    And a dito trait specialization:

    namespace boost { namespace spirit { namespace x3 { namespace traits {
        template <> void move_to<It, Range>(It b, It e, Range& r) { r = { b, size_t(e-b) }; }
    } } } }
    

    Full Demo

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    #include <iostream>
    
    using It = char const*;
    struct Range {
       It data;
       size_t size;
    };
    
    namespace boost { namespace spirit { namespace x3 { namespace traits {
        template <> void move_to<It, Range>(It b, It e, Range& r) { r = { b, size_t(e-b) }; }
    } } } }
    
    int main() {
        char const input[] = "{ 123, 234, 345 }\n{ 456, 567, 678 }\n{ 789, 900, 1011 }";
    
        std::vector<Range> ranges;
    
        namespace x3 = boost::spirit::x3;
        if (x3::phrase_parse(
                std::begin(input), std::end(input), 
                x3::raw ['{' >> (x3::int_ % ',') >> '}'] % x3::eol,
                x3::blank,
                ranges)
            )
        {
            std::cout << "Parse results:\n";
            for (auto const& r : ranges) {
                std::cout << "(" << (r.data-input) << "," << r.size << ")\n";
            }
        } else {
            std::cout << "Parse failed\n";
        }
    }
    

    Prints:

    Parse results:
    (0,17)
    (18,17)
    (36,18)