Search code examples
c++structdynamicmemory-mapped-io

Is there an elegant way of parsing a byte buffer of dynamic length into a struct?


Background

As sketched up here https://godbolt.org/z/xaf95qWee (mostly same as code below), I am consuming a library that offers a shared memory ressource in form of a memory-mapped file.

For statically sized messages the read method can very elegantly return a struct (that matches the buffer's layout) and the client has a nice typed interface, without having to worry about the internals.

template<typename DataType>
struct Message{
    //Metadata
    std::uint32_t type;
    std::uint32_t error;
    DataType data;
};
struct FixedLengthData{
    std::int32_t height;
    std::int32_t age;
};
MessageType Read(){
    MessageType msg;
    std::memcpy(&msg, rawBuffer, sizeof msg);
    return msg;
}
const auto receivedMsg = Read<Message<FixedLengthData>>();

Problem / Question

However, some data payloads constitute dynamic arrays, encoded as such that the buffer contains the size of the array S (i.e. the number of entries) followed by S entries of some known type (usually ints). Thus an example might look like this: [type|type|error|error|size(e.g.4)|elem|elem|elem|elem|undef|...]

I was wondering, whether there is a similarly elegant way of reading in this dynamic structure where the size is only known whenever the msg is received.

struct DynamicLengthData{
    std::uint32_t size;
    std::array<std::int32_t, size> data; //obviously doesn't work.
};

What I have considered One idea is to define the dynamic data with a std::vector member. The "problem" with this approach is that the vector's data is on the heap, not the stack. Thus "direct" initialization won't work. Of course I could define the struct without the vector up until the size member. Then in a second step read the size and specifically read that many ints from the buffer, starting at the offset. But I was looking for a way without this second step.

struct StaticPartOfDynamicData{
  //possibly other members
  std::uint32_t size;
};
const auto msg = Read<Message<StaticPartOfDynamicData>>();
std::vector<std::int32_t> dynamicData;
// for 0 to msg.data.size fill vector by reading from buffer at offset sizeof(type + error + otherData + size)

Another idea: Because the buffer has a maximum size, I could create a c-array member that is as large as possible. This will be able to be directly initialized, but most of the array will be empty which does not sound efficient (I know not to optimize prematurely but this is mostly a theoretical question at this point and not for a production system).


Solution

  • A example of how i handle it in my code.

    class packet
    {
    public:
      packet(absl::Span<const char> data)
      {
        auto current = data.data();
        std::memcpy(&length_, current, sizeof(length_));
        std::advance(current, sizeof(length_));
      
        vec_.reserve(length_);
        vec_.assign(current, current + length_);
      }
      
      //public stuff as needed
    
    private:
      std::vector<char> vec_{};
      uint16_t length_{};
      //...other members
    };
    

    to deserialize the object all you have to do is something like packet{{data_ptr, data_len}};

    I have a helper function that removes a lot of the duplication and boilerplate of deserializing multiple members, but its not important to the example.

    This should fit nicely into your read method

    MessageType Read(){
        return MessageType{{rawBuffer, sizeof msg}};
    }