Search code examples
c++boostbinaryhexboost-spirit

Boost.Spirit : how to parse length preceeding byte array?


I need to parse the following byte array "080100000113fc208dff01".

Here :

  • 1-st byte "08" - ID
  • 2-nd byte "01" - length of 8-bytes array
  • 3-10 bytes - element of 8-byte array
  • 11-th byte "01" - length of 8-bytes array (should be the same as 2-nd byte)

I was trying to use qi::repeat(), followed the manual and implemented the following parser Link To Coliru

#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <vector>

namespace qi = boost::spirit::qi;

typedef unsigned int BYTE;
typedef unsigned long long ULONGLONG;

struct AVLData
{
    ULONGLONG m_timestamp;
    BYTE m_priority;
};

struct AVLDataArray
{
    BYTE m_codecID;
    BYTE m_numOfData;
    std::vector<AVLData> m_data;
    BYTE m_numOfData_last;
};

BOOST_FUSION_ADAPT_STRUCT(AVLDataArray, m_codecID, m_numOfData, m_data,     m_numOfData_last)

template <typename Iterator, typename Skipper = qi::ascii::blank_type>
    struct Grammar: qi::grammar <Iterator, AVLDataArray(), Skipper>
    {
        Grammar() : Grammar::base_type(avl_array)
        {
            qi::uint_parser<BYTE, 16, 2, 2> uint_byte_p;
            qi::uint_parser<unsigned long long, 16, 16, 16> uint_8_byte_p;

            avl_array = uint_byte_p > uint_byte_p[qi::_a = qi::_1] >    qi::repeat(qi::_a)[uint_8_byte_p > uint_byte_p] > uint_byte_p;

            BOOST_SPIRIT_DEBUG_NODES((avl_array));
        }

    private:
        qi::rule<Iterator, AVLDataArray(), Skipper, qi::locals<BYTE>> avl_array;
};

int main() {
    std::string const input = "080100000113fc208dff01";

    auto f(begin(input)), l(end(input));
    Grammar<std::string::const_iterator> g;

    AVLDataArray array;
    bool ok = qi::phrase_parse(f,l,g,qi::blank,array);

    if (ok && f == l) 
    {
        std::cout << "Parse succeeded\n";
    } else
    {
        std::cout << "Parse failed\n";
         std::cout << "->stopped at [" + std::string(f, l) + "]";
    }

    return 0;
}

But for now, I'm faced 2 problems :

1) I'm not sure I understand how to use locals (local with the same name) in 2 qi::rules. For example, is such code valid ? :

data = qi::repeat(qi::_a)[uint_8_byte_p > uint_byte_p];
vl_array = uint_byte_p > uint_byte_p[qi::_a = qi::_1] > data > uint_byte_p;

2) My example is not compiling with error

grammar.hpp:75:13: error: static assertion failed: incompatible_start_rule...

What I'm doing wrong ?

-Thanks


Solution

  • First things first:

    grammar.hpp:75:13: error: static assertion failed: incompatible_start_rule...

    means (surprise) that you use an incompatible start rule. The offender is the locals<> argument that is missing on the grammar baseclass declaration. Instead of adding that implementation detail to the public interface, consider using a wrapping start rule that invokes the real parser entry point that does have the locals<> argument.


    Further more:

    • what is the m_priority thing about? Your question doesn't address it, and neither does the sample input (so it shouldn't parse, as there's just the 8byte element, and no priority to follow).

    • did you forget to adapt AVLData?

    • ignoring that, rules with semantic actions don't auto-propagate their attributes. This is fine because you probably don't need those redundant counts in your AST node (m_numOfData and m_numOfData_last)

      You can force automatic propagation by using operator%= instead of operator= to assign the rule definition.

    • You can use omit to omit attributes from the synthesized attribute

    • You probably want to validate the opening/closing bytes e.g.:

      uint_byte_p(0x08)
      

      To check whether the closing byte matches the second say:

      qi::omit[uint_byte_p [ qi::_pass = (qi::_a == qi::_1) ] ]
      

      Thanks for @jv_ making be double-check again, you can indeed just say omit(uint_byte_p(_a)) there too.

    • If your grammar specifies ascii::blank_type you can't pass qi::blank for it. It needs to match. Once again: consider hiding the skipper using a start rule, instead of exposing the implementation detail.

    • Also, in this particular example I'd be surprised if you really want to accept blanks everywhere in the input string. Note too that int_parser is implicitly lexeme (meaning the array element or bytes cannot contain blanks even in this configuration). You should check whether this all matches your requirements.

    • Your use of expectation points practically rules out the possibility of the parse failing without an exception (unless the first byte cannot be parsed, since the first uint_byte_p isn't preceded by an expectation point like qi::eps > uint_byte_p). Consider using >> to get normal sequence semantics.

    Fixing these issues results in working code:

    Live On Coliru

    //#define BOOST_SPIRIT_DEBUG
    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <vector>
    #include <iomanip>
    
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;
    
    typedef unsigned int BYTE; // what large bytes you have, grandma!?
    
    struct AVLData {
        uint64_t m_timestamp;
        BYTE m_priority;
    };
    
    struct AVLDataArray {
        BYTE m_codecID;
        std::vector<AVLData> m_data;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(AVLData, m_timestamp, m_priority) // you need to adapt all your types
    BOOST_FUSION_ADAPT_STRUCT(AVLDataArray, m_codecID, m_data)
    
    template <typename Iterator, typename Skipper = ascii::blank_type>
        struct Grammar: qi::grammar <Iterator, AVLDataArray(), Skipper>
        {
            Grammar() : Grammar::base_type(start)
            {
                qi::uint_parser<BYTE, 16, 2, 2> uint_byte_p;
                qi::uint_parser<uint64_t, 16, 16, 16> uint_8_byte_p;
    
                avl_array %= uint_byte_p(0x08)
                          >> qi::omit[uint_byte_p[qi::_a = qi::_1]] 
                          >> qi::repeat(qi::_a)[uint_8_byte_p >> uint_byte_p]
                          >> qi::omit[uint_byte_p [ qi::_pass = (qi::_a == qi::_1) ] ]
                          ;
    
                start      = avl_array;
    
                BOOST_SPIRIT_DEBUG_NODES((avl_array)(start));
            }
    
        private:
            qi::rule<Iterator, AVLDataArray(), Skipper> start;
            qi::rule<Iterator, AVLDataArray(), Skipper, qi::locals<BYTE>> avl_array;
    };
    
    int main() {
        std::string const input = "080100000113fc208dff" /*priority:*/ "2a" /*end prioirity*/ "01";
    
        auto f(begin(input)), l(end(input));
        Grammar<std::string::const_iterator> g;
    
        AVLDataArray array;
        bool ok = qi::phrase_parse(f,l,g,ascii::blank,array);
    
        if (ok && f == l) 
        {
            std::cout << "Parse succeeded\n";
            std::cout << "Codec: " << array.m_codecID << "\n";
            for(auto& element : array.m_data)
                std::cout << "element: 0x" << std::hex << element.m_timestamp << " prio " << std::dec << element.m_priority << "\n";
        } else
        {
            std::cout << "Parse failed\n";
            std::cout << "->stopped at [" + std::string(f, l) + "]";
        }
    
        return 0;
    }
    

    Prints:

    Parse succeeded
    Codec: 8
    element: 0x113fc208dff prio 42
    

    And with debug info enabled:

    <start>
      <try>080100000113fc208dff</try>
      <avl_array>
        <try>080100000113fc208dff</try>
        <success></success>
        <attributes>[[8, [[1185345998335, 42]]]]</attributes><locals>(1)</locals>
      </avl_array>
      <success></success>
      <attributes>[[8, [[1185345998335, 42]]]]</attributes>
    </start>
    

    BONUS:

    Can I use the local across rules?

    No. You need to inherit attributes:

    Live On Coliru

        data       = qi::repeat(qi::_r1)[uint_8_byte_p >> uint_byte_p]
                  ;
        avl_array %= uint_byte_p(0x08)
                  >> qi::omit[uint_byte_p[qi::_a = qi::_1]] 
                  >> data(qi::_a)
                  >> qi::omit[uint_byte_p [ qi::_pass = (qi::_a == qi::_1) ] ]
                  ;
    

    With the rules as:

    qi::rule<Iterator, std::vector<AVLData>(BYTE), Skipper> data;
    qi::rule<Iterator, AVLDataArray(),             Skipper, qi::locals<BYTE>> avl_array;