Herlihy and Shavit's book (The Art of Multiprocessor Programming) solution to memory reclamation uses Java's AtomicStampedReference<T>;
.
To write one in C++ for the x86_64 I imagine requires at least a 12 byte swap operation - 8 for a 64bit pointer and 4 for the int.
Is there x86 hardware support for this and if not, any pointers on how to do wait-free memory reclamation without it?
Yes, there is hardware support, though I don't know if it is exposed by C++ libraries. Anyway, if you don't mind doing some low-level unportable assembly language trickery - look up the CMPXCHG16B instruction in Intel manuals.