I want to vectorize a c
program.
I searched on the internet, YouTube but found very little (which was not helpful for beginner like me and most of them were about c++
). Whatever little I understood, is that I have to use compiler intrinsics (which can be found in Intel Intrinsics Guide). I have an old machine which supports SSE 4.1, SSE 4.2 instruction.
But I can not move forward with the little knowledge I have, so my question is, how can I vectorize a c
program?
As a demonstration, can you show how to optimize the following code:
float function(float* Array, int Initial, int Finishing_point)
{
int k = 0;
float VL = 0;
for (int i = Initial; i < Finishing_point; i++)
{
k++;
Vl = Vl + Array[i] * pow(2, k);
}
return Vl;
}
Please note that, I need an introductory example, thus I am using an example that includes summation, array operation and other simple programming.
Here’s the manually vectorized function, it requires SSE1 and SSE3.
#include <xmmintrin.h> // SSE 1
#include <pmmintrin.h> // SSE 3
float computeThings( const float* rsi, int idxFirst, int idxEnd )
{
// Figure out the slice of the input array to consume
size_t count = (size_t)( idxEnd - idxFirst );
size_t countAligned = ( count / 4 ) * 4;
rsi += idxFirst;
const float* endAligned = rsi + countAligned;
const float* end = rsi + count;
// Process majority of inputs with SSE
__m128 acc = _mm_setzero_ps();
__m128 kexp = _mm_setr_ps( 2, 4, 8, 16 );
for( ; rsi < endAligned; rsi += 4 )
{
__m128 v = _mm_loadu_ps( rsi );
v = _mm_mul_ps( v, kexp );
kexp = _mm_mul_ps( kexp, _mm_set1_ps( 16 ) );
acc = _mm_add_ps( acc, v );
}
// Compute horizontal sum of the `acc` vector
// acc.xyzw += acc.yyww
acc = _mm_add_ps( acc, _mm_movehdup_ps( acc ) );
// acc.x += acc.z
acc = _mm_add_ss( acc, _mm_unpackhi_ps( acc, acc ) );
// Process the remaining 0-3 numbers
for( ; rsi < end; rsi++ )
{
__m128 v = _mm_load_ss( rsi );
v = _mm_mul_ss( v, kexp );
// kexp.x *= 2, computed as kexp.x += kexp.x
kexp = _mm_add_ss( kexp, kexp );
acc = _mm_add_ss( acc, v );
}
return _mm_cvtss_f32( acc );
}
Usage example:
float A[] = { 1,2,3,4,5,6,7,8 };
printf( "%g", computeThings( A, 0, 8 ) );