After upgrading Python environment I have noticed that function groupby()
from pandas library returns error message of type
IndexError: index 2 is out of bounds for axis 0 with size 2
occasionally, even though everything runs fine in older Python environment. In this particular case, the error actually means that in a certain column there are two unique values (e.g. a
and b
) but related pandas functions generate indices [0, 1, 2]
. This implies that index 2
is without its own unique value. Thus the error message.
Since the error did not seem to follow any obvious pattern, I "dived" into pandas code. I was able to track down the source of the problem into function decons_group_index()
in sorting.py file. The issue could be illustrated on following piece of code.
import numpy as np
x = np.array([2076999867579399,
2077965839147919,
2078931810716439,
2079897782284959,
2080863753853479,
2081829725421999,
2082795696990519,
2083761668559039])
y = np.array([0, 0, 0, 0, 0, 0, 0 , 0])
factor = 160995261420
shape = 1
labels = (x - y) % (factor * shape) // factor
print(labels)
If I run the code in python 3.7.3.final.0
, I get [0 0 0 0 0 0 0 0]
, which is expected behavior. However, if I run it in python 3.9.6.final.0
, I get [1 1 1 1 1 1 1 1]
, which triggers the above mentioned type of error.
I wonder if you have experienced anything similar and if there is any simple and elegant way how to fix the issue. I am also not sure if this could be considered as a bug and thus should be reported somewhere.
Many thx in advance,
Macky
OK - so it turned out to be a bug in numpy. Reported here.
Macky