Search code examples
c++image-processingsimdpowerpcaltivec

Altivec: analogue of _mm_sad_epu8()


I try to port a SSE function which get absolute difference of two 8-bit unsigned integer arrays. It looks like:

uint64_t AbsDiffSum(const uint8_t * a, const uint8_t * b, size_t size) 
{
    assert(size%16 == 0);
     __m128i _sum = _mm_setzero_si128();
    for(size_t i = 0; i < size; i += 16)
    {
        const __m128i _a = _mm_loadu_si128((__m128i*)(a + i));
        const __m128i _b = _mm_loadu_si128((__m128i*)(b + i));
        _sum = _mm_add_epi64(_sum, _mm_sad_epu8(_a, _b));
    }
    return _mm_cvtsi128_si64(_mm_add_epi64(_sum, _mm_srli_si128(_sum, 8)));
}

Main work is performed by intrinsic function _mm_sad_epu8().

Is there an analogue for Altivec?


Solution

  • Unfortunately, there is no direct analogue of intrinsic function _mm_sad_epu8 for Altivec. But there is a possibility to emulate it:

    typedef __vector uint8_t uint8x16_t;
    typedef __vector uint32_t uint32x4_t;
    const uint8_t K8_01 = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
    
    uint64_t AbsDiffSum(const uint8_t * a, const uint8_t * b, size_t size) 
    {
        uint32x4_t _sum = {0, 0, 0, 0};
        for(size_t i = 0; i < size; i += 16)
        {
            // Aligned loading of 128-bit vector
            uint8x16_t _a = vec_ld(a + i);
            // Aligned loading of 128-bit vector
            uint8x16_t _b = vec_ld(b + i); 
            // Find absolute difference of two 8-bit unsigned
            uint8x16_t absDifference = vec_sub(vec_max(a, b), vec_min(a, b));
            // Sum result with using of vec_msum
            _sum = vec_msum(absDifference, K8_01, _sum);
        }
        return vec_extract(_sum, 0) + vec_extract(_sum, 1) + 
               vec_extract(_sum, 2) + vec_extract(_sum, 3);
    }