Search code examples
pythonarraysnumpy

Why is Numpy converting an "object"-"int" type to an "object"-"float" type?


This could be a bug, or could be something I don't understand about when numpy decides to convert the types of the objects in an "object" array.

X = np.array([5888275684537373439, 1945629710750298993],dtype=object) + [1158941147679947299,0]
Y = np.array([5888275684537373439, 1945629710750298993],dtype=object) + [11589411476799472995,0]
Z = np.array([5888275684537373439, 1945629710750298993],dtype=object) + [115894114767994729956,0]
print(type(X[0]),X[0]) # <class 'int'> 7047216832217320738
print(type(Y[0]),Y[0]) # <class 'float'> 1.7477687161336848e+19
print(type(Z[0]),Z[0]) # <class 'int'> 121782390452532103395

The arrays themselves remain "object" type (as expected). It is unexpected that the Y array's objects got converted to "floats". Why is that happening? As a consequence I immediately loose precision in my combinatorics. To make things even stranger, removing the 0 fixes things:

X = np.array([5888275684537373439, 1945629710750298993],dtype=object) + [1158941147679947299]
Y = np.array([5888275684537373439, 1945629710750298993],dtype=object) + [11589411476799472995]
Z = np.array([5888275684537373439, 1945629710750298993],dtype=object) + [115894114767994729956]
print(type(X[0]),X[0]) # <class 'int'> 7047216832217320738
print(type(Y[0]),Y[0]) # <class 'int'> 17477687161336846434
print(type(Z[0]),Z[0]) # <class 'int'> 121782390452532103395

I have tried other things, such as using larger/smaller numbers, but rarely (if ever) end up with "floats". It is something very specific about the size of these particular "int" values.


Better code that shows the problem.

import numpy as np
A = np.array([1,1],dtype=object) + [2**62,0]
B = np.array([1,1],dtype=object) + [2**63,0]
C = np.array([1,1],dtype=object) + [2**64,0]
D = np.array([1,1],dtype=object) + [2**63]
E = np.array([1,1],dtype=object) + [2**63,2**63]
print(type(A[0]),A[0]) # <class 'int'> 4611686018427387905
print(type(B[0]),B[0]) # <class 'float'> 9.223372036854776e+18
print(type(C[0]),C[0]) # <class 'int'> 18446744073709551617
print(type(D[0]),D[0]) # <class 'int'> 9223372036854775809
print(type(E[0]),E[0]) # <class 'int'> 9223372036854775809

Solution

  • In [323]: X = np.array([5888275684537373439, 1945629710750298993],dtype=object)
    

    Case 1 - not too large integer in second argument:

    In [324]: X+[1158941147679947299,0]
    Out[324]: array([7047216832217320738, 1945629710750298993], dtype=object)
    

    Same thing if we explicity make an object array:

    In [325]: X+np.array([1158941147679947299,0],object)
    Out[325]: array([7047216832217320738, 1945629710750298993], dtype=object)
    

    2nd case - conversion to floats:

    In [326]: X+[11589411476799472995,0]
    Out[326]: array([1.7477687161336848e+19, 1.945629710750299e+18], dtype=object)
    

    Again with explicit object it's ok:

    In [327]: X+np.array([11589411476799472995,0],object)
    Out[327]: array([17477687161336846434, 1945629710750298993], dtype=object)
    

    Converting the list to array, without dtype spec makes a float - which propagates through the sum:

    In [328]: np.array([11589411476799472995,0])
    Out[328]: array([1.15894115e+19, 0.00000000e+00])
    

    where as the first case is small enough to be int64:

    In [329]: np.array([1158941147679947299,0])
    Out[329]: array([1158941147679947299,                   0], dtype=int64)
    

    third case - remaining int:

    In [330]: X+[115894114767994729956,0]
    Out[330]: array([121782390452532103395, 1945629710750298993], dtype=object)
    
    In [331]: X+np.array([115894114767994729956,0],object)
    Out[331]: array([121782390452532103395, 1945629710750298993], dtype=object)
    

    This is large enough to remain object dtype:

    In [332]: np.array([115894114767994729956,0])
    Out[332]: array([115894114767994729956, 0], dtype=object)
    

    So the key difference is in how the list is made into an array. Object dtype is a fallback option, something that's used when it can't make a "regular" numeric array. You should always assume that object dtype math is a 'step child', something that's chosen as second best.

    The second case, without the 0, is another dtype:

    In [334]: np.array([11589411476799472995])
    Out[334]: array([11589411476799472995], dtype=uint64)
    

    It is never wise to make assumptions about when a list is converted into an object dtype array. It that feature is important, make it explicit!