grouby() function in Pandas returning IndexError: index 2 is out of bounds for axis 0 with size 2

After upgrading Python environment I have noticed that function groupby() from pandas library returns error message of type

IndexError: index 2 is out of bounds for axis 0 with size 2

occasionally, even though everything runs fine in older Python environment. In this particular case, the error actually means that in a certain column there are two unique values (e.g. a and b) but related pandas functions generate indices [0, 1, 2]. This implies that index 2 is without its own unique value. Thus the error message.

Since the error did not seem to follow any obvious pattern, I "dived" into pandas code. I was able to track down the source of the problem into function decons_group_index() in sorting.py file. The issue could be illustrated on following piece of code.

import numpy as np

x = np.array([2076999867579399,
              2077965839147919,
              2078931810716439,
              2079897782284959,
              2080863753853479,
              2081829725421999,
              2082795696990519,
              2083761668559039])

y = np.array([0, 0, 0, 0, 0, 0, 0 , 0])
factor = 160995261420
shape = 1

labels = (x - y) % (factor * shape) // factor

print(labels)

If I run the code in python 3.7.3.final.0, I get [0 0 0 0 0 0 0 0], which is expected behavior. However, if I run it in python 3.9.6.final.0, I get [1 1 1 1 1 1 1 1], which triggers the above mentioned type of error.

I wonder if you have experienced anything similar and if there is any simple and elegant way how to fix the issue. I am also not sure if this could be considered as a bug and thus should be reported somewhere.

Many thx in advance,

Macky

Solution

OK - so it turned out to be a bug in numpy. Reported here.

Macky