Search code examples
pythonpandasdataframepandas-styles

Pandas Styling with Conditional Rules


I was trying to style a Pandas Dataframe using 2 different columns. I was successful as long as the conditional was about the column itself, however when it depends on another one, I was unable to get the desired results.

I want to color the cell in "Date II" if "Date I" is in the past.

def date_pii(row):
        ret = ["" for _ in row.index]
        print(row['Date PI'])
        if row['Date PI'] < datetime.now():
            ret[row.index.get_loc("Date PII")] = "background-color: red"
            return ret

styler = df3.style \
        .applymap(lambda x: 'background-color: %s' % 'red' if x <= datetime.now() else '', subset=['Date PI']) \
        .applymap(lambda x: 'background-color: %s' % 'yellow' if x < datetime.now() + timedelta(days=30) else '',
                  subset=['Date PII']) \
        .applymap(lambda x: 'background-color: %s' % 'orange' if x <= datetime.now() else '', subset=['Date PII']) \
        .applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PI'])\
        .applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PII'])\
        .apply(date_pii, axis=1) ----> THIS IS THE ISSUE

styler.to_excel(writer, sheet_name='Report Paris', index=False)

At runtime I get the following error:

ValueError: Function <function generate_report_all.<locals>.date_pii at 0x7fd3964d9160> returned the wrong shape.
Result has shape: (532,)
Expected shape:   (532, 10)

The dataframe looks like this: df3

The first orange cell in "Date PII" is correct, however, the remaining ones (where PI is red) I would like them to become red as well.

Thanks for the help!


Solution

  • The general approach to this type of problem is to pass the specified columns as a subset to Styler.apply. This allows us to create styles at the DataFrame level and use loc indexing to build out the styles based on conditions. The other major benefit is that rather than chaining we can use the extra space to provide documentation and also reduce the overhead of all of those lambda:

    def style_dates(subset_df):
        # Empty Styles
        style_df = pd.DataFrame(
            '', index=subset_df.index, columns=subset_df.columns
        )
        # Today's Date
        today = pd.Timestamp.now().normalize()
        # Date PII is within 30 days from today
        style_df.loc[
            subset_df['Date PII'].le(today + pd.Timedelta(days=30)),
            'Date PII'
        ] = 'background-color: yellow'
    
        # Date PI is before today
        style_df.loc[
            subset_df['Date PI'].lt(today),
            ['Date PI', 'Date PII']
        ] = 'background-color: red'
    
        # Date PII is before today and Date PI is after Today
        style_df.loc[
            subset_df['Date PII'].lt(today) & subset_df['Date PI'].gt(today),
            'Date PII'
        ] = 'background-color: orange'
    
        # Either is NaN
        style_df[subset_df.isna()] = 'background-color: gray'
        return style_df
    
    
    styler = df3.style.apply(
        style_dates, axis=None, subset=['Date PII', 'Date PI']
    ).format(
        # Optional Explicit Date Format
        formatter='{:%Y-%m-%d}', na_rep='NaT', subset=['Date PII', 'Date PI']
    )
    

    styled DataFrame


    Setup DataFrame randomly generated always relative to the current date (styles will be consistent while dates will not be):

    import numpy as np
    import pandas as pd
    from numpy.random import Generator, MT19937
    
    norm_today = pd.Timestamp.now().normalize()
    rng = Generator(MT19937(1023))
    
    
    def md(lower_bound, upper_bound, rng_=rng):
        return pd.Timedelta(days=rng_.integers(lower_bound, upper_bound))
    
    
    df3 = pd.DataFrame({
        'Desc': [
            'PII within 30 days',  # PII yellow
            'PII in past and PI in future',  # PII orange
            'PI past',  # Both red
            'PI empty',  # grey
            'PII empty',  # grey
            'PII in future but not within 30 days'  # No Styles
        ],
        'Date PII': [norm_today + md(1, 10), norm_today - md(1, 10),
                     norm_today, norm_today, np.nan,
                     norm_today + md(40, 50)],
        'Date PI': [norm_today, norm_today + md(1, 10),
                    norm_today - md(1, 10), np.nan, norm_today,
                    norm_today]
    })
    
    Desc Date PII Date PI
    PII within 30 days 2021-11-06 00:00:00 2021-11-03 00:00:00
    PII in past and PI in future 2021-10-31 00:00:00 2021-11-11 00:00:00
    PI past 2021-11-03 00:00:00 2021-11-01 00:00:00
    PI empty 2021-11-03 00:00:00 NaT
    PII empty NaT 2021-11-03 00:00:00
    PII in future but not within 30 days 2021-12-19 00:00:00 2021-11-03 00:00:00