Keras/Tensorflow have slightly different outputs on different CPU architectures

I know similar questions were asked in the past, for example Is it normal that model output slightly different on different platforms?, and there are more.

I am fully aware to differences originating from floating point operations on e.g. CPU vs. GPU, and to the fact that different OSs (e.g., Mac vs Linux in the linked SO question above) may have differnet binary libraries working in the backend, etc.

However, the small difference I see in my Keras model ouptut is on 2 machines (on AWS) with the same OS, exactly the same set of Python package versions (inc. of course Tensorflow), and the only difference is the model/architecture of the CPU - one has "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz" and the other "Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz". Simple floating point operations like 0.1 + 0.2 - 0.3 give the same (non-zero) output on both machines. Of course a deep network model involves tons of floating-point operations, so it may not be relevant.

My question is, can such a difference in CPU architectures explain the different output? The difference I see is in the 5th digit of the final prediction of the model, for some inputs.

Solution

Yes, this can be expected, different CPUs have different floating point implementation (slightly), and different bugs (remember the original Pentium FDIV bug?), even the same CPU but with different steppings (basically versions) can have different bugs.

Also paralellism plays a role, due to pseudo-stochasticity in parallel behavior, this can also affect computations, due to floating point math not being associative: (a+b) + c not equal to a + (b+c), due to floating point rounding.

So yes there can be many explanations on why your numbers are different for these two CPUs.