Search code examples
pythonpandasmulti-index

Using idxmax on a hierarchical dataframe


I'm trying to find the index of the maximum values in multiple columns in a multi-index Pandas dataframe.

        Kommune  Upplands  Vallentuna...   Kiruna
Year    Party  
1973    M        0.9       29.2      ...   20     
        KD       15        10        ...   2 
        MP       1.1       4         ...   5     
        V        6         7         ...   8  
        SD       NaN       NaN       ...   NaN
        L        10.1      13.5      ...   8.8 
1976    M        1.8       29.2      ...   20     
        KD       16        10        ...   2 
        MP       10        4         ...   5     
        V        15        7         ...   8    
        SD       NaN       NaN       ...   NaN
        L        11.9      15        ...   18
...     ...      ...       ...       ...   ... 
...     ...      ...       ...       ...   ... 
2014    M        28        22        ...   29     
        KD       4.5       13        ...   5 
        MP       11        8         ...   9     
        V        1.9       5         ...   10    
        SD       20        10        ...   5
        L        19        25        ...   1

The desired output is

Kommune  Upplands  Vallentuna...   Kiruna
Year      
1973     KD        M         ...   M
1976     V         M         ...   M
...      ...       ...       ...   ...
2014     M         L         ...   M  

I've tried using groupby (as suggested in a previous post on multi-index- Getting max values from pandas multiindex dataframe) but it returns a tuple for every position.

Kommune  Upplands          Vallentuna        ...   Kiruna
Year      
1973     (1973, KD)        (1973, M)         ...   (1973, M)
1976     (1976, V)         (1976, M)         ...   (1976, M)
...      ...               ...               ...   ...
2014     (2014, M)         (2014, L)         ...   (2014, M)

How do I get only the second element from each tuple? Or is there a more efficient way to find the indices?


Solution

  • Seems like you need

    df.stack().sort_values().groupby(level=[0,2]).tail(1).reset_index(level=1).Party.unstack()
    Out[544]: 
         Upplands Vallentuna Kiruna
    Year                           
    1973       KD          M      M
    1976       KD          M      M