Search code examples
c++serializationleveldbcapnproto

Cap'n Proto - De-/Serialize struct to/from std::string for storing in LevelDB


I want to store some Capnproto struct in a LevelDB, so I have to serialize it to string and deserialize it back from a std::string later. Currently, I play around with the following (adapted from here: https://groups.google.com/forum/#!msg/capnproto/viZXnQ5iN50/B-hSgZ1yLWUJ):

capnp::MallocMessageBuilder message;
WortData::Builder twort = message.initRoot<WortData>();
twort.setWid(1234);
twort.setW("Blabliblub");
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
std::cout << data << std::endl;
const kj::ArrayPtr<const capnp::word> view(
    reinterpret_cast<const capnp::word*>(&(*std::begin(data))),
    reinterpret_cast<const capnp::word*>(&(*std::end(data))));
capnp::FlatArrayMessageReader message2(view);
WortData::Reader wortRestore = message2.getRoot<WortData>();
std::cout << wortRestore.getWid() << " " << std::string(wortRestore.getW()) << std::endl;

And it basically works, but the people in the link above were unsure if this approach will cause errors later and since the discussion is pretty old, I wanted to ask if there's a better way. Someone in the end said something like "use memcpy!", but I'm not sure if that's useful and how to do this with the array types needed for FlatArrayMessageReader.

Thanks in advance!
dvs23

Update:

I tried to implement the suggestion related to the word-aligning:

capnp::MallocMessageBuilder message;
WortData::Builder twort = message.initRoot<WortData>();
twort.setWid(1234);
twort.setW("Blabliblub");
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
std::cout << data << std::endl;

if(reinterpret_cast<uintptr_t>(data.data()) % sizeof(void*) == 0) {
    const kj::ArrayPtr<const capnp::word> view(
        reinterpret_cast<const capnp::word*>(&(*std::begin(data))),
        reinterpret_cast<const capnp::word*>(&(*std::end(data))));
    capnp::FlatArrayMessageReader message2(view);
    WortData::Reader wortRestore = message2.getRoot<WortData>();
    std::cout << wortRestore.getWid() << " " << std::string(wortRestore.getW()) << std::endl;
}
else {
    size_t numWords = data.size() / sizeof(capnp::word);

    if(data.size() % sizeof(capnp::word) != 0) {
        numWords++;
        std::cout << "Something wrong here..." << std::endl;
    }

    std::cout << sizeof(capnp::word) << " " << numWords << " " << data.size() << std::endl;

    capnp::word dataWords[numWords];
    std::memcpy(dataWords, data.data(), data.size());
    kj::ArrayPtr<capnp::word> dataWordsPtr(dataWords, dataWords + numWords);
    capnp::FlatArrayMessageReader message2(dataWordsPtr);
    WortData::Reader wortRestore = message2.getRoot<WortData>();
    std::cout << wortRestore.getWid() << " " << std::string(wortRestore.getW()) << std::endl;
}

Solution

  • The linked conversation is still accurate to the best of my knowledge. (Most of the messages on that thread are me, and I'm the author of Cap'n Proto...)

    It's very likely that the buffer backing any std::string will be word-aligned in practice -- but it is not guaranteed. When reading from a std::string, you should probably check that the pointer is aligned (e.g. by reinterpret_cast<uintptr_t>(str.data()) % sizeof(void*) == 0). If aligned, you can reinterpret_cast the pointer to capnp::word*. If not aligned, you'll need to make a copy. In practice the code will probably never make a copy because std::string's backing buffer is probably always aligned.

    On the writing end, avoiding copies is trickier. Your code as you've written it actually makes two copies.

    One here:

    kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
    

    And one here:

    std::string data(bytes.begin(), bytes.end());
    

    It looks like LevelDB supports a type called Slice, which you can use instead of std::string when writing, to avoid the second copy:

    leveldb::Slice data(bytes.begin(), bytes.size());
    

    This will reference the underlying bytes rather than make a copy, and should be usable in all the LevelDB write functions.

    Unfortunately, one copy is unavoidable here, because LevelDB wants the value to be one contiguous byte array, whereas a Cap'n Proto message may be broken into multiple segments. The only way to avoid this would be for LevelDB to add support for "gather writes".