I just learned about pointer swizzling and I am quite unsure of the actual use of it.
For example, let's say I have a Windows service serializing an object containing pointers using some kind of pointer swizzling and then deserializing it in a different process.
What are the preconditions for it to work ?
It looks to me like it would fail because the addresses the pointers are trying to access are in another's process memory space and the OS will not allow the new process to access them. Pointer swizzling make the pointers survive the change of context but it is not sufficient, is it ? It would also require having the data on a kind of shared memory segment in the first place, am I wrong ?
Also I am quite curious to see actual examples if you happen to know any in C++ with any library like boost or anything else similar.
When you have swizzled pointers, you cannot follow them (efficiently) until they are unswizzled.
Imagine if you have a bunch of records, each with links (pointers) to other records, in some arbitrary graph.
One naive way to swizzle these is to take the binary value of the pointer as a UID, and serialize this. While we do this we also maintain a table of record address to order in serialization, and we serialize that last. Call this the swizzle table.
When we deserialize, we load up the data structures, and we build a table of (order in serialization) to (new record address in memory). Then we load up the swizzle table, which is a map from (old address) to (order in serialization).
We merge those two tables and we get a (old address) to (new record address in memory) table -- the unswizzle table.
Next, we go over our deserialized records and for each pointer, we apply this map. The old binary value of each address is stored in some pointer; we look it in the unswizzle table, and replace it. Now each pointer is pointing at the address of the record in the new address space.
struct node {
std::vector<node*> links;
void write( OutArch& out ) const& {
out.register_swizzle(this);
out << links.size();
for (node* n:links) {
out << out.swizzle(n);
}
}
static node* read( InArch& in ) {
auto* r = new node;
in.register_unswizzle( r );
std::size_t n;
in >> n;
r->reserve(n);
for (std::size_t i = 0; i<n; ++i) {
std::intptr_t ptr;
in >> ptr;
r->links.push_back( reinterpret_cast<node*>(ptr) ); // danger
}
return r;
}
friend void do_unswizzle( InArch& in, node* n ) {
for (node*& link : n->links ) {
link = in.unswizzle(link);
}
}
};
struct OutArch {
friend void operator<<( OutArch& arch, std::size_t count ); //TODO
friend void operator<<( OutArch& arch, std::intptr_t ptr ); //TODO
std::intptr_t swizzle( void* ptr ) {
return reinterpret_cast<std::intptr_t>(ptr);
}
void register_swizzle( void* ptr ) {
swizzle_table.insert( {(reinterpret_cast<std::intptr_t>(p), record_number} );
++record_number;
}
private:
// increased
std::size_t record_number = 0;
std::map< std::intptr_t, std::size_t > swizzle_table;
};
struct InArch {
friend void operator>>( InArch& arch, std::size_t& count ); //TODO
friend void operator>>( InArch& arch, std::intptr_t& count ); //TODO
template<class T>
void register_unswizzle( T* t ) {
unswizzle_table.insert( {record_number, t} );
++record_number;
unswizzle_tasks.push_back([t](InArch* self){
do_unswizzle( *self, t );
});
}
struct unswizzler_t {
void* ptr;
template<class T>
operator T*()&&{return static_cast<T*>(ptr);}
};
unswizzler_t unswizzle( void* ptr ) {
auto p = reinterpret_cast<std::intptr_t>(ptr);
auto it1 = swizzle_table.find(p);
if (it1 == swizzle_table.end()) return {nullptr};
auto it2 = unswizzle_table.find(it1->second);
if (it2 == unswizzle_table.end()) return {nullptr};
return { it2->second };
}
void load_swizzle_table(); //TODO
void execute_unswizzle() {
for (auto&& task: unswizzle_tasks) {
task(this);
}
}
private:
// increased
std::size_t record_number = 0;
std::map< std::size_t, void* > unswizzle_table;
std::map< std::intptr_t, std::size_t > swizzle_table;
std::vector< std::function< void(InArch*) > > unswizzle_tasks;
};
There are many ways to swizzle. Instead of saving the binary value of the pointer, you can save the order you'll serialize it (for example); but this requires a bit of careful preprocessing or time travel, as you'll have references to structures you haven't serialized yet.
Or you could generate a guid, write the guid out with each record, and keep a swizzle table of {record address} to {guid} in the old process. As you save records, you see if the pointers are in your swizzle table; if not, add them. Then write the guid instead of the pointer. Don't write the swizzle table in this case; the unswizzle table of {guid} to {record address} can be constructed just from a guid header on each record. Then using that unswizzle table, rebuild the records on the destination side.