Search code examples
pythonpandaswarningsconcatenation

Pandas pivot table gives "FutureWarning: Sorting because non-concatenation axis is not aligned"


I have the following DataFrame:

df = pd.DataFrame({'category':['A', 'B', 'C', 'C'], 
                   'bar':[2, 5, float('nan'), float('nan')]})

And then I have just one line of code, where I'm trying to apply two aggregation functions on a column in my DataFrame, grouped by values in another column:

df.pivot_table('bar', 'category', aggfunc=['median', 'count'])

For some reason, it gives me the following warning:

FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'.

From what I can understand about this warning, it concerns "concat()" or "append()" methods. I called neither of these, so I can only assume that one of the two is used implicitly somewhere inside the "pivot_table()" method. I'd be happy to pass the "sort" parameter to silence the warning, but I don't see the way of doing that if the method is called implicitly.

I ran some tests on this example, and it looks like the warning appears only when all three of the following conditions are met:

1) there's at least one group in the values that get aggregated that consists entirely of missing values;

2) there are at least two aggregation functions;

3) one of the aggregation functions is "count()".

My working theory at the moment is that the two aggregation functions can't agree on how many rows the resulting pivot table should have. The "count()" function puts zeroes in all groups that consist entirely of missing values. But the other functions ignore such groups entirely, so the respective rows are simply missing from the pivot table when "count()" is not present. However, when "count()" is present, it forces the other functions to not ignore these groups, and creates NaN values in respective cells.

This result works fine for me, I can use it, but I don't like leaving warnings unattended. Thoughts on what can be done about it?


Solution

  • I was able to replicate the issue on pandas 0.25.1, the waning is related to pandas.core.reshape.pivot.py that includes the following statement

    # line 56
    return concat(pieces, keys=keys, axis=1)
    

    Concat is causing the warning. pieces is a list of dataframes where each elemenet is related to each function from the parameter aggfunc, what happens is the following:

    pieces[0]
    #           bar
    # category     
    # A         2.0
    # B         5.0
    
    pieces[1]
    #           bar
    # category     
    # A           1
    # B           1
    # C           0
    

    Since pieces[0] and pieces[1] have a different index, pandas needs to sort the dataframes to match between the values.

    This issue does not happen in 1.0.1. If you don't want the warning to show, add parameter dropna=False so the NaN-columns are all included in the aggfunction.

    df.pivot_table('bar', 'category', aggfunc=['median', 'count'], dropna=False)
    

    Be careful, some funciones are not meant to be used with nan values, numpy inlcudes al lot of functions that handles nan like np.nanmedian and np.nanmax consider checking those out.