This is a very strange problem. I cannot see any differences between code1 and code2 . However, there should be a difference because they produce different results : (notice f0 and f0A (acts as a buffer))
code1 :
for (k = 0; k < 6; k++) {
r1 = i0 + 6 * k;
f0 = 0.0F;
for (r2 = 0; r2 < 6; r2++) {
f0 += (float)b_a[i0 + 6 * r2] * p_est[r2 + 6 * k];
}
a[r1] = f0;
}
code2:
float f0A[6] = {0};
for (k = 0; k < 6; k++) {
r1 = i0 + 6 * k;
for (r2 = 0; r2 < 6; r2++) {
f0A[r2] += (float)b_a[i0 + 6 * r2] * p_est[r2 + 6 * k];
}
}
for (r2 = 0; r2 < 6; r2++) {
r1 = i0 + 6 * r2;
a[r1] = f0A[r2];
}
In the first loop, you are setting a[r1] to a summation stored in f0. It is being added to each loop.
In the second loop, you aren't doing a summation, your loop is using += but it is storing each one in a different f0A index. Thus a[r1] is not given the correct value
There is the difference