Search code examples
pythonnumpyinternal-representationdivmod

Python numpy.divmod and integer representation


I was trying to use numpy.divmod with very large integers and I noticed a strange behaviour. At around 2**63 ~ 1e19 (which should be the limit for the usual memory representation of int in python 3.5+), this happens:

from numpy import divmod

test = 10**6
for i in range(15,25):
  x = 10**i
  print(i, divmod(x, test))

15 (1000000000, 0)
16 (10000000000, 0)
17 (100000000000, 0)
18 (1000000000000, 0)
19 (10000000000000.0, 0.0)
20 ((100000000000000, 0), None)
21 ((1000000000000000, 0), None)
22 ((10000000000000000, 0), None)
23 ((100000000000000000, 0), None)
24 ((1000000000000000000, 0), None)

Somehow, the quotient and remainder works fine till 2**63, then there's something different.

My guess is that the int representation is "vectorized" (i.e. as BigInt in Scala, as a little endian Seq of Long). But then, I'd expect, as a result of divmod(array, test), a pair of arrays: the array of quotients and the array of remainders.

I have no clue about this feature. It does not happen with the built-in divmod (everything works as expected)

Why does this happen? Does it have something to do with int internal representation?

Details: numpy version 1.13.1, python 3.6


Solution

  • The problem is that np.divmod will convert the arguments to arrays and what happens is really easy:

    >>> np.array(10**19)
    array(10000000000000000000, dtype=uint64)
    >>> np.array(10**20)
    array(100000000000000000000, dtype=object)
    

    You will get an object array for 10**i with i > 19, in the other cases it will be a "real NumPy array".

    And, indeed, it seems like object arrays behave strangely with np.divmod:

    >>> np.divmod(np.array(10**5, dtype=object), 10)   # smaller value but object array
    ((10000, 0), None)
    

    I guess in this case the normal Python built-in divmod calculates the first returned element and all remaining items are filled with None because it delegated to Pythons function.

    Note that object arrays often behave differently than native dtype arrays. They are a lot slower and often delegate to Python functions (which is often the reason for different results).