Search code examples
c++serializationdistributed-computingunordered-set

C++ Get byte representation of unordered_set *without* serialization


In a distributed computing setup, I want to send an unordered_set to many nodes. I am aware of serialization in C++ e.g. by using boost::serialization. My beef with serialization is that I'm facing the costs to rebuild the unordered_setdata structure on every node after receiving the serialized data.

My idea is to write a custom allocator for unordered_set that allocates a fixed size of contiguous memory and returns the starting memory address after allocation. Then I want to grab the byte representation of the unordered_set, sent it over the wire, and tell the receiving node that this chunk of memory is an unordered_set.

Would that work? Do you guys have alternative ideas how to tackle my problem? Or do you have any relevant pointers e.g. to writing such an allocator? Any feedback is appreciated.

Thank you!


Solution

  • This is probably a bad idea for several reasons:

    1. Often the implementation of hash from one machine to the next is going to differ, so your hash table wouldn't be valid any more on the receiving machine.
    2. The hash table implementation often will contain pointers. You can't just copy pointers from one machine to another; they're absolute addresses (on most platforms).
    3. It is possible that the sending machine and receiving machine have different byte order (say the sending machine is x86 and the receiving machine is POWER (e.g. an XBox), in which case the results you'll get will be complete gibberish.

    I would just send over a list of key/value pairs. Insert them into a hash table constructed on the receiving end.

    Also, keep in mind when sending something over the wire often the processing power cost is minor compared to bandwidth costs. Hash tables are space-inefficient -- they need lots of empty buckets in order to maintain near-O(1) performance. As a result, it is likely that overall performance would be worse even if you could implement a way to send the hash table across the wire as-is.