Search code examples
floating-pointoverflowargmaxunderflowfloat32

How do I get the results from argmax of [0., 1e-8]?


When I ran below code, I got x=1, y=0

a = np.array([0., 1e-8]).astype('float32') 
x = a.argmax()
y = (a+1).argmax()

I already know floating point expression. But, I don't know why I can get x=1, y=0.
I think that overflow or underflow can be related to the results. Please help to explain!


Solution

  • When 1e-8 is converted to float32, the result is the nearest number representable in float32, which is 9.99999993922529029077850282192230224609375e-09, which is about 1.34•2−27.

    So, in real-number arithmetic, the sum of 1 and 9.99999993922529029077850282192230224609375e-09 is about 1 + 1.34•2−27. The two numbers representable in float32 that are closest to that are 1 and 1+2−23. Of those two, 1 is closer to 1 + 1.34•2−27, so float32 addition produces 1 as a result. Thus, the elements of a+1 are 1 and 1. Both are tied candidates for the maximum, and argmax returns the index of the first one.