Search code examples
c++boost-spiritboost-spirit-qi

How to parse RFC3339 Duration with Boost.Spirit


I'm trying to parse a duration from RFC3339 using Boost.Spirit, but I'm having trouble.

The grammar is defined in RFC3339 Appendix A:

dur-second = 1*DIGIT "S"
dur-minute = 1*DIGIT "M" [dur-second]
dur-hour   = 1*DIGIT "H" [dur-minute]

dur-time   = "T" (dur-hour / dur-minute / dur-second)

dur-day    = 1*DIGIT "D"
dur-month  = 1*DIGIT "M" [dur-day]
dur-year   = 1*DIGIT "Y" [dur-month]

dur-date   = (dur-day / dur-month / dur-year) [dur-time]

duration   = "P" (dur-date / dur-time)

(I dropped support for weeks.)

Which I translated to this grammar:

auto second = copy(int_ >> "S");
auto minute = copy(int_ >> "M" >> -second);
auto hour = copy(int_ >> "H" >> -minute);

auto time = copy("T" >> (hour | minute | second));

auto day = copy(int_ >> "D");
auto month = copy(int_ >> "M" >> -day);
auto year = copy(int_ >> "Y" >> -month);

auto date = copy((day | month | year) >> -time);

m_start = "P" >> (date | time);

I believe this grammar is a faithful representation of the grammar in RFC3339, but I could be wrong.

The trouble I'm having is getting the information out. I usually create a POD structure to pass in that gets filled out automatically, but as written, I'm getting a compile error. First, the code:

#include "boost/fusion/adapted/struct.hpp"
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/qi_copy.hpp"

#include <cstdlib>

struct Duration {
  int years;
  int months;
  int days;
  int hours;
  int minutes;
  int seconds;
};

BOOST_FUSION_ADAPT_STRUCT(
  Duration,
  (int, years)(int, months)(int, days)(int, hours)(int, minutes)(int, seconds))

struct Duration_grammar :
    boost::spirit::qi::grammar<std::string::const_iterator, Duration()> {
  Duration_grammar() : Duration_grammar::base_type{m_start}
  {
    using boost::spirit::qi::copy;
    using boost::spirit::qi::int_;

    auto second = copy(int_ >> "S");
    auto minute = copy(int_ >> "M" >> -second);
    auto hour = copy(int_ >> "H" >> -minute);

    auto time = copy("T" >> (hour | minute | second));

    auto day = copy(int_ >> "D");
    auto month = copy(int_ >> "M" >> -day);
    auto year = copy(int_ >> "Y" >> -month);

    auto date = copy((day | month | year) >> -time);

    m_start = "P" >> (date | time);
  }

private:
  boost::spirit::qi::rule<Duration_grammar::iterator_type, Duration()> m_start;
};

int main(int argc, char** argv)
{
  auto const value = std::string{"P1Y"};

  auto begin = value.begin();
  auto end = value.end();
  auto duration = Duration{};

  if (boost::spirit::qi::parse(begin, end, Duration_grammar{}, duration) &&
      (begin == end)) {
    return EXIT_SUCCESS;
  } else {
    return EXIT_FAILURE;
  }
}

And the error:

/usr/include/boost/spirit/home/qi/detail/assign_to.hpp:153:20: error: no matching function for call to 'Duration::Duration(const int&)'
  153 |             attr = static_cast<Attribute>(val);
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~

It seems like the attribute of m_start is not a sequence of six integers like the Duration POD structure would like, but I'm not sure how to approach this. I went with the auto and copy approach because I couldn't figure out the actual attribute types that each rule would have. Do I need a different grammar to get what I want (i.e., is copying the grammar from RFC3339 a bad approach)?


Solution

  • As always if you want to enable automatic attribute propagation (which I heartily recommend), it helps to maintain 1:1 correspondence between rule/expression structure and AST structue.

    In that sense, I guess I agree that copying the grammar from RFC3339 App.A is not well-suited to creating a PEG grammar, let alone a Spirit one.

    First Short

    FWIW, my naive attempt to clean up and make a self-contained example gave no compilation (or runtime) errors: Live On Coliru:

    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    
    namespace AST {
        struct Duration { int years, months, days, hours, minutes, seconds; };
        using boost::fusion::operator<<;
    } // namespace AST
    
    BOOST_FUSION_ADAPT_STRUCT(AST::Duration, years, months, days, hours, minutes, seconds)
    
    namespace Grammar {
        namespace qi = boost::spirit::qi;
        struct Duration : qi::grammar<std::string::const_iterator, AST::Duration()> {
            Duration() : Duration::base_type{m_start} {
                using qi::copy;
                using qi::int_;
    
                auto second = copy(int_ >> "S" /*| qi::attr(0)*/);
                auto minute = copy(int_ >> "M" >> -second);
                auto hour   = copy(int_ >> "H" >> -minute);
                auto time   = copy("T" >> (hour | minute | second));
                auto day    = copy(int_ >> "D");
                auto month  = copy(int_ >> "M" >> -day);
                auto year   = copy(int_ >> "Y" >> -month);
                auto date   = copy((day | month | year) >> -time);
    
                m_start = "P" >> (date | time) >> qi::eoi;
            }
    
          private:
            qi::rule<iterator_type, AST::Duration()> m_start;
        };
    } // namespace Grammar
    
    int main() {
        Grammar::Duration const p;
    
        for (std::string const value : {
                 "P1Y", "P2M", "P3W", "P4D", //
                 "P5H", "P6M", "P7S",        //
                 "PT8H", "PT9M", "PT10S",    //
             }) {
            AST::Duration duration{};
    
            if (parse(begin(value), end(value), p, duration))
                std::cout << quoted(value) << " -> " << duration << "\n";
            else
                std::cout << quoted(value) << " FAILED\n";
        }
    }
    

    Of course the results are unusable:

    "P1Y" -> (1 0 0 0 0 0)
    "P2M" -> (2 0 0 0 0 0)
    "P3W" FAILED
    "P4D" -> (4 0 0 0 0 0)
    "P5H" FAILED
    "P6M" -> (6 0 0 0 0 0)
    "P7S" FAILED
    "PT8H" -> (8 0 0 0 0 0)
    "PT9M" -> (9 0 0 0 0 0)
    "PT10S" -> (10 0 0 0 0 0)
    

    Fixing The Grammar

    Linking through to the origin spec I glance on

    P[n]Y[n]M[n]DT[n]H[n]M[n]S or P[n]W

    Which seems a much more tractable match (especially since weeks don't seem in scope for you?).

    Next up, none of your fused members are actually optional, so let's not create an inconsistency there and instead spell it exactly like we want (with defaults zero):

    m_start = 'P'                          //
        >> (qi::int_ >> 'Y' | qi::attr(0)) //
        >> (qi::int_ >> 'M' | qi::attr(0)) //
        >> (qi::int_ >> 'D' | qi::attr(0)) //
        >> ('T' | qi::eoi)                 //
        >> (qi::int_ >> 'H' | qi::attr(0)) //
        >> (qi::int_ >> 'M' | qi::attr(0)) //
        >> (qi::int_ >> 'S' | qi::attr(0)) //
        >> qi::eoi;
    

    Note how this

    • grammar matches the AST layout
    • no more conditionals
    • no more optionals
    • no more nested optional sub-expressions
    • it integrates the end-of-input matching into the grammar instead of relying on the caller to check the iterators

    It also prints the desired outcomes for all the input shown previously:

    Live On Coliru

    "P1Y" -> (1 0 0 0 0 0)
    "P2M" -> (0 2 0 0 0 0)
    "P3W" FAILED
    "P4D" -> (0 0 4 0 0 0)
    "P5H" FAILED
    "P6M" -> (0 6 0 0 0 0)
    "P7S" FAILED
    "PT8H" -> (0 0 0 8 0 0)
    "PT9M" -> (0 0 0 0 9 0)
    "PT10S" -> (0 0 0 0 0 10)
    

    Bonus

    Optional T delimiter, fractionals:

    Live On Coliru

    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    
    namespace AST {
        struct Duration {
            float years, months, days, hours, minutes, seconds;
        };
        using boost::fusion::operator<<;
    } // namespace AST
    
    BOOST_FUSION_ADAPT_STRUCT(AST::Duration, years, months, days, hours, minutes, seconds)
    
    namespace Grammar {
        namespace qi = boost::spirit::qi;
        struct Duration : qi::grammar<std::string::const_iterator, AST::Duration()> {
            Duration() : Duration::base_type{m_start} {
                m_start = 'P'                          //
                    >> (qi::auto_ >> 'Y' | qi::attr(0)) //
                    >> (qi::auto_ >> 'M' | qi::attr(0)) //
                    >> (qi::auto_ >> 'D' | qi::attr(0)) //
                    >> -qi::lit('T')                   //
                    >> (qi::auto_ >> 'H' | qi::attr(0)) //
                    >> (qi::auto_ >> 'M' | qi::attr(0)) //
                    >> (qi::auto_ >> 'S' | qi::attr(0)) //
                    >> qi::eoi;
            }
    
          private:
            qi::rule<iterator_type, AST::Duration()> m_start;
        };
    } // namespace Grammar
    
    int main() {
        Grammar::Duration const p;
    
        for (std::string const value : {
                 "P1Y", "P2M", "P3W", "P4D", //
                 "P5H", "P6M", "P7S",        //
                 "PT8H", "PT9M", "PT10S",    //
    
                 // optional T
                 "P11H",                       //
                 "P12M", "P13H14M", "P15M16M", //
                 "P17S",                       //
    
                 // fractionals
                 "P1.8Y", "P1.9M", "P2.1D",    //
                 "P2.2H", "P2.3M", "P2.4S",    //
                 "PT2.5H", "PT2.6M", "PT2.7S", //
    
                 // optional T
                 "P2.8H",                           //
                 "P2.9M", "P3.0H3.1M", "P3.2M3.3M", //
                 "P3.4S",                           //
             }) {
            AST::Duration duration{};
    
            if (parse(begin(value), end(value), p, duration))
                std::cout << quoted(value) << " -> " << duration << "\n";
            else
                std::cout << quoted(value) << " FAILED\n";
        }
    }
    

    Printing

    // earlier results and the new ones: 
    "P11H" -> (0 0 0 11 0 0)
    "P12M" -> (0 12 0 0 0 0)
    "P13H14M" -> (0 0 0 13 14 0)
    "P15M16M" -> (0 15 0 0 16 0)
    "P17S" -> (0 0 0 0 0 17)
    "P1.8Y" -> (1.8 0 0 0 0 0)
    "P1.9M" -> (0 1.9 0 0 0 0)
    "P2.1D" -> (0 0 2.1 0 0 0)
    "P2.2H" -> (0 0 0 2.2 0 0)
    "P2.3M" -> (0 2.3 0 0 0 0)
    "P2.4S" -> (0 0 0 0 0 2.4)
    "PT2.5H" -> (0 0 0 2.5 0 0)
    "PT2.6M" -> (0 0 0 0 2.6 0)
    "PT2.7S" -> (0 0 0 0 0 2.7)
    "P2.8H" -> (0 0 0 2.8 0 0)
    "P2.9M" -> (0 2.9 0 0 0 0)
    "P3.0H3.1M" -> (0 0 0 3 3.1 0)
    "P3.2M3.3M" -> (0 3.2 0 0 3.3 0)
    "P3.4S" -> (0 0 0 0 0 3.4)
    

    Closing Thoughts

    Also consider parsing into Boost DateTime objects directly, which could at once give you correct "overflow" to smaller moduli and e.g. evaluation of effective duration under leapdays, daylight savings changes etc.

    Here's another "let's shoe-horn the RFC into a Spirit grammar" exercise: Parse into a complex struct with boost::spirit - note the conclusions in the comments.