My program does NxN matrices multiplication where elements of both the matrices are initialized to values (0, 1, 2, ... N) using a for loop. Both the matrix elements are of type float. There is no memory allocation problem. Matrix sizes are input as a multiple of 4 eg: 4x4 or 8x8 etc. The answers are verified with a sequential calculation. Everything works fine upto matrix size of 64x64. A difference between the sequential version and SSE version is observed only when the matrix size exceeds 64 (eg: 68 x 68).
SSE snippet is as shown (size = 68):
void matrix_mult_sse(int size, float *mat1_in, float *mat2_in, float *ans_out) {
__m128 a_line, b_line, r_line;
int i, j, k;
for (k = 0; k < size * size; k += size) {
for (i = 0; i < size; i += 4) {
j = 0;
b_line = _mm_load_ps(&mat2_in[i]);
a_line = _mm_set1_ps(mat1_in[j + k]);
r_line = _mm_mul_ps(a_line, b_line);
for (j = 1; j < size; j++) {
b_line = _mm_load_ps(&mat2_in[j * size + i]);
a_line = _mm_set1_ps(mat1_in[j + k]);
r_line = _mm_add_ps(_mm_mul_ps(a_line, b_line), r_line);
}
_mm_store_ps(&ans_out[i + k], r_line);
}
}
}
With this, the answer differs at element 3673 where I get the answers of multiplication as follows
scalar: 576030144.000000 & SSE: 576030208.000000
I also wrote a similar program in Java with the same initialization and setup and N = 68 and for element 3673, I got the answer as 576030210.000000
Now there are three different answers and I'm not sure how to proceed. Why does this difference occur and how do we eliminate this?
I am summarizing the discussion in order to close this question as answered.
So according to the article (What Every Computer Scientist Should Know About Floating-Point Arithmetic) in link, floating point always results in a rounding error which is a direct consequence of the approximate representation nature of the floating point number.
Arithmetic operations such as addition, subtraction etc results in a precision error. Hence, the 6 most significant digits of the floating point answer (irrespective of where the decimal point is situated) can be considered to be accurate while the other digits may be erroneous (prone to precision error).