Search code examples
cssesimdintrinsicssse2

What's the difference between logical SSE intrinsics?


Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do the same thing: compute bitwise OR of their operands. My questions:

  1. Is there any difference between using one or another intrinsic (with appropriate type casting). Won't there be any hidden costs like longer execution in some specific situation?

  2. These intrinsics maps to three different x86 instructions (por, orps, orpd). Does anyone have any ideas why Intel is wasting precious opcode space for several instructions which do the same thing?


Solution

  • I think all three are effectively the same, i.e. 128 bit bitwise operations. The reason different forms exist is probably historical, but I'm not certain. I guess it's possible that there may be some additional behaviour in the floating point versions, e.g. when there are NaNs, but this is pure guesswork. For normal inputs the instructions seem to be interchangeable, e.g.

    #include <stdio.h>
    #include <emmintrin.h>
    #include <pmmintrin.h>
    #include <xmmintrin.h>
    
    int main(void)
    {
        __m128i a = _mm_set1_epi32(1);
        __m128i b = _mm_set1_epi32(2);
        __m128i c = _mm_or_si128(a, b);
    
        __m128 x = _mm_set1_ps(1.25f);
        __m128 y = _mm_set1_ps(1.5f);
        __m128 z = _mm_or_ps(x, y);
            
        printf("a = %vld, b = %vld, c = %vld\n", a, b, c);
        printf("x = %vf, y = %vf, z = %vf\n", x, y, z);
    
        c = (__m128i)_mm_or_ps((__m128)a, (__m128)b);
        z = (__m128)_mm_or_si128((__m128i)x, (__m128i)y);
    
        printf("a = %vld, b = %vld, c = %vld\n", a, b, c);
        printf("x = %vf, y = %vf, z = %vf\n", x, y, z);
        
        return 0;
    }
    

    Terminal:

    $ gcc -Wall -msse3 por.c -o por
    $ ./por
    
    a = 1 1 1 1, b = 2 2 2 2, c = 3 3 3 3
    x = 1.250000 1.250000 1.250000 1.250000, y = 1.500000 1.500000 1.500000 1.500000, z = 1.750000 1.750000 1.750000 1.750000
    a = 1 1 1 1, b = 2 2 2 2, c = 3 3 3 3
    x = 1.250000 1.250000 1.250000 1.250000, y = 1.500000 1.500000 1.500000 1.500000, z = 1.750000 1.750000 1.750000 1.750000