Search code examples
c++sse2sse

Packing and unpacking data for SSE/SSE2 instructions?


I'm trying to learn more about how SSE/SSE2 work: I know that SSE/SSE2 use mmx registers with a size of 128 bit (16 byte) and that usually these registers have 4 float cells where I can store my floats by packing. Before getting the result I should "unpack them".

My question is: since I'm a noob, why should I pack these values into xmm registers and why should I unpack them? What's the advantage in this?


Solution

  • You don't have to pack/unpack them. If the numbers are already in the correct format, you just use the suitable move instruction to load them into a register, or a memory operand for to use the memory content as a second operand for a add, substract, etc.

    What does happen sometimes is that the data is not ending up in the right place from the calculation to go where it needs to go, and this is where various pack and unpack instructions come in handy.

    Say for example that you are doing some 3D graphics math on this

    struct coord { float X, Y, Z, W; };
    

    But to make the calculation efficient, we load up four of these structures at once, with X from all four in one register, Y from all four in another register, etc. Now, after we have, for example, multiplied all X, Y, Z and W values [four at a time] with the transformation matrix to scale/rotate the object, we need to store it back as X, Y, Z and W again, which is done by "unpacking" the appropriate elements back into their corresponding X, Y, Z, W individual entries.

    Of course, if instead of having an array of coord values, you stored four arrays of X, Y, Z and W values, we could just store the new values into their respective slots in the array without packing/unpacking the values.