MOVMSKB
does a really nice job of packing byte fields into bits.
However I want to do the reverse.
I have a bit field of 16 bits that I want to put into a XMM register.
1 byte field per bit.
Preferably a set bit should set the MSB (0x80) of each byte field, but I can live with a set bit resulting in a 0xFF result in the byte field.
I've seen the following option on https://software.intel.com/en-us/forums/intel-isa-extensions/topic/298374:
movd mm0, eax
punpcklbw mm0, mm0
pshufw mm0, mm0, 0x00
pand mm0, [mask8040201008040201h]
pcmpeb mm0, [mask8040201008040201h]
However this code only works with MMX registers and cannot be made to work with XMM regs because pshufw does not allow that.
I know I can use PSHUFB
, however that's SSSE3 and I would like to have SSE2 code because it needs to work on any AMD64 system.
Is there a way to do this is pure SSE2 code?
no intrinsics please, just plain intel x64 code.
Luckily pshufd
is SSE2, you just need to unpack it once more. I believe this should work:
movd xmm0, eax
punpcklbw xmm0, xmm0
punpcklbw xmm0, xmm0
pshufd xmm0, xmm0, 0x50
pand xmm0, [mask]
pcmpeqb xmm0, [mask]
Johan said:
If you're starting with a word the first unpack will give you a dword, allowing you to shorten it like so:
movd xmm0, eax punpcklbw xmm0, xmm0 pshufd xmm0, xmm0, 0x00 pand xmm0, [mask] pcmpeqb xmm0, [mask]
However this code should not work. Example: Assume input is 0x00FF
(word), that is we want the low 8 bytes set.
punpcklbw xmm0, xmm0 ; 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF
pshufd xmm0, xmm0, 0x00 ; 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF FF
pand xmm0, [mask] ; 00 00 02 01 00 00 02 01 00 00 02 01 00 00 02 01
pcmpeqb xmm0, [mask] ; 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF FF
This is the wrong result because we wanted 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF
. Sure, it does give you 8 set bytes, just not the 8 which correspond to the bits.