I am calculating a lot of instances of the distance from a n dimensional (10-39) point to an array. I want it to go as fast as possible, for Android 4.0+, specifically for the Galaxy S3. I have got the hardware FPU working, but have heard somewhere that you can speed things up with vectorization and neon. Questions like this Android build system, NEON and non-NEON builds howver don't give me a simple answer.
What is the simplest way I can use this vectorization in the S3 with an example that shows a speed up for this kind of calculation (n dim point distance to n dim vector).
Here is the loop code:
// go through each point in the vector
for (bi=0; bi<sizeOfVect; bi++) {
r[bi] = 0.0; // initialise distance
//calculate distance in each dimension (d is 10-39 depending)
for (di=0; di<d; di++) {
rj[di] = s1[i*d+di] - b[bi*d+di];
// s1 is the n dim point, b is the vector array
r[bi] += rj[di]*rj[di];
}
}
You have two options to get vectorization from a CPU (focusing on ARM); either compiler helps you or you do it yourself.
You can utilize vector instructions (NEON) in an ARM CPU by writing assembly or using intrinsics.
You can get help from compiler but then you have to write vectorizable code. For an example on how to get this see this so post.