In a distributed computing setup, I want to send an unordered_set to many nodes. I am aware of serialization in C++ e.g. by using boost::serialization
. My beef with serialization is that I'm facing the costs to rebuild the unordered_set
data structure on every node after receiving the serialized data.
My idea is to write a custom allocator for unordered_set that allocates a fixed size of contiguous memory and returns the starting memory address after allocation. Then I want to grab the byte representation of the unordered_set
, sent it over the wire, and tell the receiving node that this chunk of memory is an unordered_set
.
Would that work? Do you guys have alternative ideas how to tackle my problem? Or do you have any relevant pointers e.g. to writing such an allocator? Any feedback is appreciated.
Thank you!
This is probably a bad idea for several reasons:
I would just send over a list of key/value pairs. Insert them into a hash table constructed on the receiving end.
Also, keep in mind when sending something over the wire often the processing power cost is minor compared to bandwidth costs. Hash tables are space-inefficient -- they need lots of empty buckets in order to maintain near-O(1) performance. As a result, it is likely that overall performance would be worse even if you could implement a way to send the hash table across the wire as-is.