Search code examples
c++boostboost-spiritboost-spirit-qi

Parse a parcticular string using Boost Spirit Qi


I am new to Boost Spirit and is struggling to create a proper expression to parse the following input (actually a result of a stdout of some command):

^+ line-17532.dyn.kponet.fi      2   7   377     1   +1503us[+9103us] +/-   55ms

Which I need to parse into a set of strings and integers and recorded in variables. Most of the line should be just parsed into a variable of appropriate type (string or int). So in the end, I get:

string:  "^+", "line-17532.dyn.kponet.fi", "+1503us", "+9103us", "55ms"
int   :   2, 7, 377, 1 

The pair

+1503us[+9103us] 

can also be with space

+503us[ +103us] 

and I need stuff before square brackets and in square brackets to be placed in separate strings.

additionally, time designations can be expressed as

ns, ms, us, s

I appreciate examples about how to deal with it, because the available documentation is quite sparse and not cohesive.


Large piece of the log, along with headings describing the individual fields:

MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^+ ns2.sdi.fi                    2   9   377   381  -1476us[-1688us] +/-   72ms
^+ line-17532.dyn.kponet.fi      2  10   377   309   +302us[ +302us] +/-   59ms
^* heh.fi                        2  10   377   319  -1171us[-1387us] +/-   50ms
^+ stara.mulimuli.fi             3  10   377   705  -1253us[-1446us] +/-   73ms

Solution

  • As always I start with sketching a useful AST:

    namespace AST {
        using clock = std::chrono::high_resolution_clock;
    
        struct TimeSample {
            enum Direction { up, down } direction; // + or -
            clock::duration value;
        };
    
        struct Record {
            std::string prefix; // "^+"
            std::string fqdn;   // "line-17532.dyn.kponet.fi"
            int a, b, c, d;     // 2, 7, 377, 1
            TimeSample primary, braced;
            clock::duration tolerance;
        };
    }
    

    Now that we know what we want to parse, we mostly just mimick the AST with rules, for a bit:

    using namespace qi;
    
    start     = skip(blank) [record_];
    
    record_   = prefix_ >> fqdn_ >> int_ >> int_ >> int_ >> int_ >> sample_ >> '[' >> sample_ >> ']' >> tolerance_;
    
    prefix_   = string("^+"); // or whatever you need to match here
    fqdn_     = +graph; // or whatever additional constraints you have
    sample_   = direction_ >> duration_;
    duration_ = (long_ >> units_) [ _val = _1 * _2 ];
    tolerance_= "+/-" >> duration_;
    

    Of course, the interesting bits are the units and the direction:

    struct directions : qi::symbols<char, AST::TimeSample::Direction> {
        directions() { add("+", AST::TimeSample::up)("-", AST::TimeSample::down); }
    } direction_;
    struct units : qi::symbols<char, AST::clock::duration> {
        units() {
            using namespace std::literals::chrono_literals;
            add("s", 1s)("ms", 1ms)("us", 1us)("µs", 1us)("ns", 1ns);
        }
    } units_;
    

    The white-space acceptance is governed by a skipper; I chose qi::blank_type for the non-lexeme rules:

    using Skipper = qi::blank_type;
    qi::rule<It, AST::Record()> start;
    qi::rule<It, AST::Record(), Skipper> record_;
    qi::rule<It, AST::TimeSample(), Skipper> sample_;
    qi::rule<It, AST::clock::duration(), Skipper> duration_, tolerance_;
    // lexemes:
    qi::rule<It, std::string()> prefix_;
    qi::rule<It, std::string()> fqdn_;
    

    DEMO

    Putting it all together, use it:

    int main() {
        std::istringstream iss(R"(^+ line-17532.dyn.kponet.fi      2   7   377     1   +1503us[+9103us] +/-   55ms
    )");
    
        std::string line;
    
        while (getline(iss, line)) {
            auto f = line.cbegin(), l = line.cend();
            AST::Record record;
            if (parse(f, l, parser<>{}, record))
                std::cout << "parsed: " << boost::fusion::as_vector(record) << "\n";
            else
                std::cout << "parse error\n";
    
            if (f!=l)
                std::cout << "remaining unparsed input: '" << std::string(f,l) << "'\n";
        }
    }
    

    Which prints: Live On Coliru

    parsed: (^+ line-17532.dyn.kponet.fi 2 7 377 1 +0.001503s +0.009103s 0.055s)
    

    (debug output below)

    Full Code:

    Live On Coliru

    #define BOOST_SPIRIT_DEBUG
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/fusion/adapted.hpp>
    #include <sstream>
    #include <chrono>
    
    namespace std { namespace chrono {
        // for debug
        std::ostream& operator<<(std::ostream& os, duration<double> d) { return os << d.count() << "s"; }
    } }
    
    namespace AST {
        using clock = std::chrono::high_resolution_clock;
    
        struct TimeSample {
            enum Direction { up, down } direction; // + or -
            clock::duration value;
    
            // for debug:
            friend std::ostream& operator<<(std::ostream& os, Direction d) {
                char const* signs[] = {"+","-"};
                return os << signs[d];
            }
            friend std::ostream& operator<<(std::ostream& os, TimeSample const& sample) {
                return os << sample.direction << std::chrono::duration<double>(sample.value).count() << "s";
            }
        };
    
        struct Record {
            std::string prefix; // "^+"
            std::string fqdn;   // "line-17532.dyn.kponet.fi"
            int a, b, c, d;     // 2, 7, 377, 1
            TimeSample primary, braced;
            clock::duration tolerance;
        };
    }
    
    BOOST_FUSION_ADAPT_STRUCT(AST::Record, prefix, fqdn, a, b, c, d, primary, braced, tolerance)
    BOOST_FUSION_ADAPT_STRUCT(AST::TimeSample, direction, value)
    
    namespace qi = boost::spirit::qi;
    
    template <typename It = std::string::const_iterator>
    struct parser : qi::grammar<It, AST::Record()> {
        parser() : parser::base_type(start) {
            using namespace qi;
    
            start     = skip(blank) [record_];
    
            record_   = prefix_ >> fqdn_ >> int_ >> int_ >> int_ >> int_ >> sample_ >> '[' >> sample_ >> ']' >> tolerance_;
    
            prefix_   = string("^+"); // or whatever you need to match here
            fqdn_     = +graph; // or whatever additional constraints you have
            sample_   = direction_ >> duration_;
            duration_ = (long_ >> units_) [ _val = _1 * _2 ];
            tolerance_= "+/-" >> duration_;
    
            BOOST_SPIRIT_DEBUG_NODES(
                    (start)(record_)
                    (prefix_)(fqdn_)(sample_)(duration_)(tolerance_)
                )
        }
      private:
        struct directions : qi::symbols<char, AST::TimeSample::Direction> {
            directions() { add("+", AST::TimeSample::up)("-", AST::TimeSample::down); }
        } direction_;
        struct units : qi::symbols<char, AST::clock::duration> {
            units() {
                using namespace std::literals::chrono_literals;
                add("s", 1s)("ms", 1ms)("us", 1us)("µs", 1us)("ns", 1ns);
            }
        } units_;
    
        using Skipper = qi::blank_type;
        qi::rule<It, AST::Record()> start;
        qi::rule<It, AST::Record(), Skipper> record_;
        qi::rule<It, AST::TimeSample(), Skipper> sample_;
        qi::rule<It, AST::clock::duration(), Skipper> duration_, tolerance_;
        // lexemes:
        qi::rule<It, std::string()> prefix_;
        qi::rule<It, std::string()> fqdn_;
    };
    
    int main() {
        std::istringstream iss(R"(^+ line-17532.dyn.kponet.fi      2   7   377     1   +1503us[+9103us] +/-   55ms
    )");
    
        std::string line;
    
        while (getline(iss, line)) {
            auto f = line.cbegin(), l = line.cend();
            AST::Record record;
            if (parse(f, l, parser<>{}, record))
                std::cout << "parsed: " << boost::fusion::as_vector(record) << "\n";
            else
                std::cout << "parse error\n";
    
            if (f!=l)
                std::cout << "remaining unparsed input: '" << std::string(f,l) << "'\n";
        }
    }
    

    Debug Output

    <start>
      <try>^+ line-17532.dyn.kp</try>
      <record_>
        <try>^+ line-17532.dyn.kp</try>
        <prefix_>
          <try>^+ line-17532.dyn.kp</try>
          <success> line-17532.dyn.kpon</success>
          <attributes>[[^, +]]</attributes>
        </prefix_>
        <fqdn_>
          <try>line-17532.dyn.kpone</try>
          <success>      2   7   377   </success>
          <attributes>[[l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i]]</attributes>
        </fqdn_>
        <sample_>
          <try>   +1503us[+9103us] </try>
          <duration_>
            <try>1503us[+9103us] +/- </try>
            <success>[+9103us] +/-   55ms</success>
            <attributes>[0.001503s]</attributes>
          </duration_>
          <success>[+9103us] +/-   55ms</success>
          <attributes>[[+, 0.001503s]]</attributes>
        </sample_>
        <sample_>
          <try>+9103us] +/-   55ms</try>
          <duration_>
            <try>9103us] +/-   55ms</try>
            <success>] +/-   55ms</success>
            <attributes>[0.009103s]</attributes>
          </duration_>
          <success>] +/-   55ms</success>
          <attributes>[[+, 0.009103s]]</attributes>
        </sample_>
        <tolerance_>
          <try> +/-   55ms</try>
          <duration_>
            <try>   55ms</try>
            <success></success>
            <attributes>[0.055s]</attributes>
          </duration_>
          <success></success>
          <attributes>[0.055s]</attributes>
        </tolerance_>
        <success></success>
        <attributes>[[[^, +], [l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i], 2, 7, 377, 1, [+, 0.001503s], [+, 0.009103s], 0.055s]]</attributes>
      </record_>
      <success></success>
      <attributes>[[[^, +], [l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i], 2, 7, 377, 1, [+, 0.001503s], [+, 0.009103s], 0.055s]]</attributes>
    </start>