Search code examples
assemblyssemmx

(a*b)/256 and MMX


I'm wondering if it is possible to do the following calculation with four values parallel within a MMX-Register:

(a*b)/256

where a is a signed word and b is an unsigned value (blend factor) in the range of 0-256

I think my problem is that I'm not sure about how (or if) pmullw and pmulhw will help me with this task.


Solution

  • If you know that a*b won't overflow a signed 16-bit field, then you can use pmullw (intrinsic _mm_mullo_pi16, or SSE intrinsic _mm_mullo_epi16) and then shift right by 8 to do the division by 256.

    Where

    MMX:

    __m64 a, b;
    ...
    a = _mm_mullo_pi16 (a, b);
    a = _mm_srli_pi16 (a, 8);
    

    SSE2:

    __m128i a, b;
    ...
    a = _mm_mullo_epi16 (a, b);
    a = _mm_srli_epi16 (a, 8);