I am trying to optimize a function using SSE2. I'm wondering if I can prepare the data for my assembly code better than this way. My source data is a bunch of unsigned chars from pSrcData. I copy it to this array of floats, as my calculation needs to happen in float.
unsigned char *pSrcData = GetSourceDataPointer();
__declspec(align(16)) float vVectX[4];
vVectX[0] = (float)pSrcData[0];
vVectX[1] = (float)pSrcData[2];
vVectX[2] = (float)pSrcData[4];
vVectX[3] = (float)pSrcData[6];
__asm
{
movaps xmm0, [vVectX]
[...] // do some floating point calculations on float vectors using addps, mulps, etc
}
Is there a quicker way for me to cast every other byte of pSrcData to a float and store it into vVectX?
Thanks!
(1) AND with a mask to zero out the odd bytes (PAND
)
(2) Unpack from 16 bits to 32 bits (PUNPCKLWD
with a zero vector)
(3) Convert 32 bit ints to floats (CVTDQ2PS
)
Three instructions.