I was trying to style a Pandas Dataframe using 2 different columns. I was successful as long as the conditional was about the column itself, however when it depends on another one, I was unable to get the desired results.
I want to color the cell in "Date II" if "Date I" is in the past.
def date_pii(row):
ret = ["" for _ in row.index]
print(row['Date PI'])
if row['Date PI'] < datetime.now():
ret[row.index.get_loc("Date PII")] = "background-color: red"
return ret
styler = df3.style \
.applymap(lambda x: 'background-color: %s' % 'red' if x <= datetime.now() else '', subset=['Date PI']) \
.applymap(lambda x: 'background-color: %s' % 'yellow' if x < datetime.now() + timedelta(days=30) else '',
subset=['Date PII']) \
.applymap(lambda x: 'background-color: %s' % 'orange' if x <= datetime.now() else '', subset=['Date PII']) \
.applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PI'])\
.applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PII'])\
.apply(date_pii, axis=1) ----> THIS IS THE ISSUE
styler.to_excel(writer, sheet_name='Report Paris', index=False)
At runtime I get the following error:
ValueError: Function <function generate_report_all.<locals>.date_pii at 0x7fd3964d9160> returned the wrong shape.
Result has shape: (532,)
Expected shape: (532, 10)
The dataframe looks like this:
The first orange cell in "Date PII" is correct, however, the remaining ones (where PI is red) I would like them to become red as well.
Thanks for the help!
The general approach to this type of problem is to pass the specified columns as a subset
to Styler.apply
. This allows us to create styles at the DataFrame level and use loc
indexing to build out the styles based on conditions. The other major benefit is that rather than chaining we can use the extra space to provide documentation and also reduce the overhead of all of those lambda:
def style_dates(subset_df):
# Empty Styles
style_df = pd.DataFrame(
'', index=subset_df.index, columns=subset_df.columns
)
# Today's Date
today = pd.Timestamp.now().normalize()
# Date PII is within 30 days from today
style_df.loc[
subset_df['Date PII'].le(today + pd.Timedelta(days=30)),
'Date PII'
] = 'background-color: yellow'
# Date PI is before today
style_df.loc[
subset_df['Date PI'].lt(today),
['Date PI', 'Date PII']
] = 'background-color: red'
# Date PII is before today and Date PI is after Today
style_df.loc[
subset_df['Date PII'].lt(today) & subset_df['Date PI'].gt(today),
'Date PII'
] = 'background-color: orange'
# Either is NaN
style_df[subset_df.isna()] = 'background-color: gray'
return style_df
styler = df3.style.apply(
style_dates, axis=None, subset=['Date PII', 'Date PI']
).format(
# Optional Explicit Date Format
formatter='{:%Y-%m-%d}', na_rep='NaT', subset=['Date PII', 'Date PI']
)
Setup DataFrame randomly generated always relative to the current date (styles will be consistent while dates will not be):
import numpy as np
import pandas as pd
from numpy.random import Generator, MT19937
norm_today = pd.Timestamp.now().normalize()
rng = Generator(MT19937(1023))
def md(lower_bound, upper_bound, rng_=rng):
return pd.Timedelta(days=rng_.integers(lower_bound, upper_bound))
df3 = pd.DataFrame({
'Desc': [
'PII within 30 days', # PII yellow
'PII in past and PI in future', # PII orange
'PI past', # Both red
'PI empty', # grey
'PII empty', # grey
'PII in future but not within 30 days' # No Styles
],
'Date PII': [norm_today + md(1, 10), norm_today - md(1, 10),
norm_today, norm_today, np.nan,
norm_today + md(40, 50)],
'Date PI': [norm_today, norm_today + md(1, 10),
norm_today - md(1, 10), np.nan, norm_today,
norm_today]
})
Desc | Date PII | Date PI |
---|---|---|
PII within 30 days | 2021-11-06 00:00:00 | 2021-11-03 00:00:00 |
PII in past and PI in future | 2021-10-31 00:00:00 | 2021-11-11 00:00:00 |
PI past | 2021-11-03 00:00:00 | 2021-11-01 00:00:00 |
PI empty | 2021-11-03 00:00:00 | NaT |
PII empty | NaT | 2021-11-03 00:00:00 |
PII in future but not within 30 days | 2021-12-19 00:00:00 | 2021-11-03 00:00:00 |