Search code examples
pandasmulti-indexpandas-styles

Pandas MultiIndex Dataframe Styling error when writing to Excel


I am trying to write a multi-index data frame to excel using pandas styling and I am getting an error.

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(9,4), pd.MultiIndex.from_product([['A', 'B','C'], ['r1', 'r2','r3']]), columns=[['E1','E1','E2','E2'],['d1','d2','d1','d2']])

def highlight_max(s, props=''):
    return np.where(s == np.nanmax(s.values), props, '')

def highlight_all_by_condition (value, condition, props=''):
    return np.where(value >= condition, props, '')

def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(np.nanmax(value) >= condition, props, '')

df_formatted = df.style.set_properties(**{'font-family': 'Arial','font-size': '10pt'})

unique_column_list = list(set(df.columns.get_level_values(0)))
idx = pd.IndexSlice
for each in unique_column_list:
    slice_=idx[idx[each]]
    df_formatted = df_formatted.apply(highlight_max, props='color:black; font-weight: bold', axis=1, subset=slice_)\
                               .apply(highlight_all_by_condition, condition = 0.55, props='color:red;font-weight: bold; background-color: #ffe6e6', axis=1, subset=slice_)\
                               .apply(highlight_max_value_by_condition, condition = 1, props='color:green;font-weight: bold; background-color: #ffff33', axis=1, subset=slice_)

df_formatted.to_excel("test.xlsx", engine = 'openpyxl')

I am getting the following error:

ValueError: Function <function highlight_max_value_by_condition at 0x000001EE1394E940> returned the wrong shape.
Result has shape: (9,)
Expected shape:   (9, 2)

The second styling function (highlight_max_value_by_condition) is a conditional styling, where it needs to highlight the max value only if it satisfies the condition and if I remove that styling function, then I don't get any error.

Any help is much appreciated. Thanks in advance.


Solution

  • Assuming we're looking for highlight_max_value_by_condition is meant to apply styles to cells which are both the max in the subset and fulfill the condition we can add an & to combine the conditions:

    def highlight_max_value_by_condition(value, condition, props=''):
        return np.where(
            (value == np.nanmax(value)) & (value >= condition),
            props,
            ''
        )
    

    Beyond that, however, there are quite a few things we can do to cleanup the general approach.

    Styler objects compound naturally, there is no need to assign back. In addition instead of using list(set( to get the level values, MultiIndex.levels will already provide the unique values for each level. Furthermore, since we're working with the top-most level we don't need pd.IndexSlice since access column access by top-level MultiIndex key will provide all child columns.

    All this together means that df_formatted can be build like:

    df_formatted = df.style.set_properties(**{
        'font-family': 'Arial',
        'font-size': '10pt'
    })
    
    for slice_ in df.columns.levels[0]:
        df_formatted.apply(
            highlight_max,
            props='color:black; font-weight: bold',
            axis=1, subset=slice_
        ).apply(
            highlight_all_by_condition, condition=0.55,
            props='color:red;font-weight: bold; background-color: #ffe6e6',
            axis=1, subset=slice_
        ).apply(
            highlight_max_value_by_condition, condition=1,
            props='color:green;font-weight: bold; background-color: #ffff33',
            axis=1, subset=slice_
        )
    

    styled table


    Setup made reproducible with seed(6) and with modified function

    import numpy as np
    import pandas as pd
    
    np.random.seed(6)
    df = pd.DataFrame(
        np.random.randn(9, 4),
        pd.MultiIndex.from_product([['A', 'B', 'C'], ['r1', 'r2', 'r3']]),
        columns=[['E1', 'E1', 'E2', 'E2'], ['d1', 'd2', 'd1', 'd2']]
    )
    
    
    def highlight_max(s, props=''):
        return np.where(s == np.nanmax(s.values), props, '')
    
    
    def highlight_all_by_condition(value, condition, props=''):
        return np.where(value >= condition, props, '')
    
    
    def highlight_max_value_by_condition(value, condition, props=''):
        return np.where(
            (value == np.nanmax(value)) & (value >= condition),
            props,
            ''
        )
    

    df:

                E1                  E2          
                d1        d2        d1        d2
    A r1 -0.311784  0.729004  0.217821 -0.899092
      r2 -2.486781  0.913252  1.127064 -1.514093
      r3  1.639291 -0.429894  2.631281  0.601822
    B r1 -0.335882  1.237738  0.111128  0.129151
      r2  0.076128 -0.155128  0.634225  0.810655
      r3  0.354809  1.812590 -1.356476 -0.463632
    C r1  0.824654 -1.176431  1.564490  0.712705
      r2 -0.181007  0.534200 -0.586613 -1.481853
      r3  0.857248  0.943099  0.114441 -0.021957