Search code examples
pythonflatbuffers

How to use flatbuffers in python the right way?


I have two questions regarding the use of flatbuffers in python, that focus around how to use them the right way without writing code that utterly defeats its performance advantage. I want to use flatbuffers for serialization and network communication between a C# and python program. I have read the tutorial, python specifics and some blogposts that use other languages with flatbuffers but couldn't find one for python.

1.) Flatbuffers are for fast serialization. Is this even true for python? The performance for python just states "Ok" where other languages get "Great". Specific times are missing. I know that python is generally not as fast as C or C++ but how slow are we talking? To the point where it defeats it's promised performance advantage (for example compared to JSON)? Maybe someone already did a benchmark with python? If not, I will try to write one that compares times between C# and python and also flattbuffers vs json in python.

2.) It is fast, because of "zero copy". But what does that mean for a program that needs to alter the data? Especially since the objects are immutable. In order to work with them I need to copy the values into my local representation of the objects anyway. Doesn't defeat that the purpose? The tutorial states this example for reading from a flatbuffer:

import MyGame.Example as example
import flatbuffers
buf = open('monster.dat', 'rb').read()
buf = bytearray(buf)
monster = example.GetRootAsMonster(buf, 0)
hp = monster.Hp()
pos = monster.Pos()

Aren't those last two lines just copies?


Solution

  • The design of FlatBuffers heavily favors languages like C/C++/Rust in attaining maximum speed. The Python implementation mimics what these languages do, but it is very unnatural for Python, so it is not the fastest possible serializer design that you would get if you designed purely for Python.

    I haven't benchmarked anything on Python, but a Python specific design would certainly beat FlatBuffers-Python in many cases. One case where the FlatBuffers design will win even in Python is for large files that are accessed sparsely or randomly, since it doesn't actually unpack all the data at once.

    You typically use FlatBuffers because you have the performance critical part of your stack in a faster language, and then you also want to be able to process the data in Python elsewhere. If you work purely in Python however, FlatBuffers is possibly not your best pick (unless, again, you work with large sparse data).

    Better of course is to not do your heavy lifting in Python in the first place.