Search code examples
pythonpandasindex-error

IndexError: index 0 is out of bounds for axis 0 with size 0 for trying to find mode (most frequent value)


I concatenated 500 XSLX-files, which has the shape (672006, 12). All processes have a unique number, which I want to groupby() the data to obtain relevant information. For temperature I would like to select the first and for number the most frequent value.

Test data:

df_test = 
pd.DataFrame({"number": [1,1,1,1,2,2,2,2,3,3,3,3], 
'temperature': [2,3,4,5,4,3,4,5,5, 3, 4, 4], 
'height': [100, 100, 0, 100, 100, 90, 90, 100, 100, 90, 80, 80]})

df_test.groupby('number')['temperature'].first()

df_test.groupby('number')['height'].agg(lambda x: x.value_counts().index[0])

I get the following error for trying to getting the most frequent height per number: IndexError: index 0 is out of bounds for axis 0 with size 0

Strange enough, mean() / first() / max() etc are all working. And on the second part of the dataset that I concatenated seperately the aggregation worked.

Can somebody suggest what to do with this error? Thanks!


Solution

  • I think your problem is one or more of your groupby is returning all NaN heights:

    See this example, where I added a number 4 with np.NaN as its heights.

    df_test = pd.DataFrame({"number": [1,1,1,1,2,2,2,2,3,3,3,3,4,4], 
    'temperature': [2,3,4,5,4,3,4,5,5, 3, 4, 4, 5, 5], 
    'height': [100, 100, 0, 100, 100, 90, 90, 100, 100, 90, 80, 80, np.nan, np.nan]})
    
    df_test.groupby('number')['temperature'].first()
    
    df_test.groupby('number')['height'].agg(lambda x: x.value_counts().index[0])
    

    Output:

    IndexError: index 0 is out of bounds for axis 0 with size 0
    

    Let's fill those NaN with zero and rerun.

    df_test = pd.DataFrame({"number": [1,1,1,1,2,2,2,2,3,3,3,3,4,4], 
    'temperature': [2,3,4,5,4,3,4,5,5, 3, 4, 4, 5, 5], 
    'height': [100, 100, 0, 100, 100, 90, 90, 100, 100, 90, 80, 80, np.nan, np.nan]})
    
    df_test = df_test.fillna(0) #Add this line
    df_test.groupby('number')['temperature'].first()
    
    df_test.groupby('number')['height'].agg(lambda x: x.value_counts().index[0])
    

    Output:

    number
    1    100.0
    2     90.0
    3     80.0
    4      0.0
    Name: height, dtype: float64