Search code examples
armvectorizationsimdneon

arm neon instruction


I have some code and, I want to use Neon instruction to change it, but I really don't know how to complete it... Can anyone help me?

void add(int n,float *a,float *b,float t) {
    int i, size = (n+2) * (n+2);
    for(i = 0; i < size; i++)
        a[i] += t * b[i];
}

Solution

  • By using NEON intrinsics.. something like this:

    void add(int n,float *a,float *b,float t) {
      int i, size = (n+2) * (n+2);
      float32x4_t temptt = vdupq_n_f32(t);
      for(i = 0; i < size; i+=4) {
        float32x4_t temp1 = vld1q_f32(a+i);
        float32x4_t temp2 = vld1q_f32(b+i);
        temp1 = vmlaq_f32(temp2, tempt, temp1);
        vst1q_f32(a + i, temp1);
      }
    }
    

    This does 4 iterations at once, and would only work when size is a multiple of 4.