Search code examples
pythonpandasnumpydictionaryprimitive

Is this a bug? Cannot simply change dict keys from numpy to primitive data types


I have a dictionary generated by pandas which has numpy.int64 objects instead of native int's as keys. I need to change these to the native type, and am confused as to why the following code is not so successful:

d = {np.int64(0): None}

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>
    k_nat = k.item()
    print(str(type(k_nat))) # <class 'int'>
    print(d)                # {0: None}
    d.update({k_nat:1})
    print(d)                # {0: 1}
                            # Therefore update using int was successful

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>

Can anyone explain what's going on here? From my perspective, this code contradicts itself as the update using the primitive k_nat was successful, but in the end the key is still a numpy.int64.


Solution

  • No, this is not a bug.

    This code shows that the key has not changed during the update:

    import numpy as np
    d = {np.int64(0): None}
    
    for k, v in d.items():
        print(str(type(k)))     # <class 'numpy.int64'>
        k_nat = k.item()
        print(str(type(k_nat))) # <class 'int'>
        print(d)                # {0: None}
        d.update({k_nat:1})
        print(d)                # {0: 1}
                                # Therefore update using int was successful
                                # But key does not change
        print(type(list(d.keys())[0])) # → <class 'numpy.int64'>
    
    for k, v in d.items():
        print(str(type(k)))     # <class 'numpy.int64'>
    

    Python treats int(0) and np.int64(0) w.r.t. dict-access. But the original key is not changed (only the value). Note that both int(0) and np.int64(0) are represented as 0 in expressions like print(d). So they look like if they are the same. However, they are equal but not identical.

    in particular we have this behavior

    print(d[np.int64(0)] == d[int(0)]) # True
    print(np.int64(0) == int(0)) # True
    print(np.int64(0) is int(0)) # False
    

    If you want to convert the key-type, you can use:

    new_d = {int(k): v for k, v in d.items()}
    print(type(list(new_d.keys())[0])) # <class 'int'>
    

    For some classes it is indeed possible to change the type of an object without changing the id of the object and thus it still works as the same dict-key:

    class A(object):
        pass
    
    class B(object):
        pass
    
    d = {A(): None}
    
    print(type(list(d.keys())[0])) # <class '__main__.A'>
    
    # change type of object but not the object itself
    list(d.keys())[0].__class__ = B
    print(type(list(d.keys())[0])) # <class '__main__.B'>
    

    However, for some other classes (including np.int64) this is not possible:

    x = np.int64(0)
    try: 
        x.__class__ = int
    except TypeError as err:
        print(err) # __class__ assignment only supported for heap types or ModuleType subclasses