I'm wondering if it is possible to do the following calculation with four values parallel within a MMX-Register:
(a*b)/256
where a is a signed word and b is an unsigned value (blend factor) in the range of 0-256
I think my problem is that I'm not sure about how (or if) pmullw and pmulhw will help me with this task.
If you know that a*b won't overflow a signed 16-bit field, then you can use pmullw (intrinsic _mm_mullo_pi16
, or SSE intrinsic _mm_mullo_epi16
) and then shift right by 8 to do the division by 256.
Where
MMX:
__m64 a, b;
...
a = _mm_mullo_pi16 (a, b);
a = _mm_srli_pi16 (a, 8);
SSE2:
__m128i a, b;
...
a = _mm_mullo_epi16 (a, b);
a = _mm_srli_epi16 (a, 8);