Search code examples
pythonnumpyoverflowpaillier

OverflowError: Python int too large to convert to C long when feed data into numpy array


I am trying to feed large number after encryption into a numpy array, but it says the number is too long and it gets overflow. I checked the code, everything is correct before I feed the numbers into the numpy array, but it got an error at the step of feeding in the data, which is en1[i,j] = pk.raw_encrypt(int(test1[i,j])).

The encrypted number I have got here is 3721469428823308171852407981126958588051758293498563443424608937516905060542577505841168884360804470051297912859925781484960893520445514263696476240974988078627213135445788309778740044751099235295077596597798031854813054409733391824335666742083102231195956761512905043582400348924162387787806382637700241133312260811836700206345239790866810211695141313302624830782897304864254886141901824509845380817669866861095878436032979919703752065248359420455460486031882792946889235009894799954640035281227429200579186478109721444874188901886905515155160376705016979283166216642522595345955323818983998023048631350302980936674. Python3 still claims it to be a int type. The number itself did not get overflow, but the numpy array does not allow it to be filled in.

What property of the numpy caused this, and is there any solution to this problem? I have considered using list to substitute numpy array but it will be rather hard to implement when it is not a 1-D array. I have attached the full test code below.

test1 = np.array([[1,2,3],[1,2,4]])
test2 = np.array([[4,1,3],[6,1,5]])

en1 = np.copy(test1)
en2 = np.copy(test2)

pk, sk = paillier.generate_paillier_keypair()

en_sum = np.copy(en1)
pl_sum = np.copy(en1)

for i in range(test1.shape[0]):
    for j in range(test2.shape[1]):
        en1[i,j] = pk.raw_encrypt(int(test1[i,j]))
        en2[i,j] = pk.raw_encrypt(int(test2[i,j]))

        en_sum[i,j] = en1[i,j]*en2[i,j]
        pl_sum[i,j] = sk.raw_decrypt(en_sum[i,j])

sum = sk.raw_decrypt(en_sum)

Solution

  • Python integers are stored with arbitrary precision, while numpy integers are stored in standard 32-bit or 64-bit representations depending on your platform.

    What this means is that while the maximum representable Python integer is bounded only by your system memory, the maximum representable Numpy integer is bounded by what is representable in 64-bits.

    You can see the maximum representable unsigned integer value here:

    >>> import numpy as np
    >>> np.iinfo(np.uint64).max
    18446744073709551615
    
    >>> 2 ** 64 - 1
    18446744073709551615
    

    The best approach for your application depends on what you want to do with these extremely large integers, but I'd lean toward avoiding Numpy arrays for integers of this size.