Search code examples
cinterleavez-order-curve

How to efficiently interleave bits from 8 __int16 numbers?


I am building Morton number for spatial indexing, I have 8 unsigned 16 bit numbers that will turn into __int128 number. The efficiency is crucial, so naive solution (loop over everything) or building separate 8 128bit numbers is too expensive.

I am using GCC, the target machine is 64 bits but without BMI2 support.

How can I speed up the computation?


Solution

  • If your machine is x86 and supports SSE2, there is a clever answer using movmsk instructions. Google SSE2 bit matrix transpose for full code.