Search code examples
flatbuffers

storing non-root table of flatbuffers object for later deserialization


Consider the following flatbuffers schema (from this stack overflow question):

table Foo {
    ...
}
table Bar {
    value:[Foo];
}
root_type Bar;

Assume the number of Foos in a typical object is significant so we want to avoid modifying schema to make Foo the root_type.

Scenario:

A C++ client serializes a proper flatbuffers object and posts it to another component (nodejs backend) that partially deserializes the object and stores the binary representing every Foo in a database as separate documents:

const buf = new flatbuffers.ByteBuffer(req.body)
const bar = fbs.Bar.getRootAsBar(buf)
for (let i = 0; i < bar.valueLength(); i++) {
  const foo = bar.value(i)
  let item = {
    'raw': foo.bb.bytes_ // <-- primary suspect
  }
  // ... store `item` as an individual entity (mongodb doc)
}

Later, a third component fetches the binary data stored in "raw" key of the mongodb documents and tries to deserialize it into a Foo object:

auto mongoCol = db.collection("results");
auto mongoResult = mongoCol.find_one(
    bsoncxx::builder::stream::document{}
    << "_id" << oid << bsoncxx::builder::stream::finalize);
// ...check that mongoResult is not null
const auto result = mongoResult->view();
const auto& binary = result["raw"].get_binary();
std::string content((const char*)binary.bytes, binary.size);
const auto& foo = flatbuffers::GetRoot<fbs::Foo>(content.c_str());

The problem:

But the pointer given as foo does not point to the expected data and any operation on foo potentially leads to segfault or access violation.

Suspicions:

I speculate that the root cause is that the binary that is stored in the database uses offsets according to the original message. So it is essentially invalid in its own original format and the offsets should be readjusted before inserting into database. But I do not see any flatbuffers function API to readjust the offsets?

One less likely root cause may be that the final deserialization code is incomplete and we have to readjust the offsets?

The reason I suspect it is related to offsets is the fact that this same code works just fine if we make a compromise and post smaller flatbuffers objects with one Foo element in every Bar vector (and change backend code to store bar.bb.bytes in raw instead).

Question:

In any way, is it even possible to grab part of a larger properly constructed flatbuffers binary file that you know represents your desired table and deserialize it on its own?


Solution

  • You can't simply copy a sub-table out of a larger FlatBuffer byte-wise, since this data is not necessarily contiguous. The best workaround is to instead make Bar store a [FooBuffer] where table FooBuffer { buf:[byte] (nested_flatbuffer: Foo) }. When you construct one of these, you construct each Foo into its own FlatBufferBuilder and then store the resulting bytes in the parent. Then when you need to stores Foos seperately this then becomes an easy copy.