I want to shade the values (cells) that equal or exceed the guideline values (currently in a dictionary, key (chemical symbol):value (guideline limit). I want it to deal with multiple sets of guidelines and shades the cell according to the guideline with the highest value that is being exceeded. The table below shows the desired output, with the guidelines assigned a color, and the exceedances of each guide assigned the same colour.
This shows the limits for three different guidelines (ANZECC, WHO, and US EPA) for Cu, Cd, SO4 and Zn. The results are below (site 1 -3) and are highlighted according to which (if any) guidelines they exceed. So for the first parameter, Cu, site 2 only exceeds the US EPA value so is shaded yellow. Site 1 all the guidelines the highest is the WHO value so is shaded blue. Site 3 also exceeds all the guidelines, the maximum is WHO, so is shaded blue as well. Cd for site 1 exceeds US EPA and ANZECC (which are the same) so is shaded green (but could have been shaded yellos as the US EPA limit is the same). Site 2 doesn't exceed any of the guidelines so is left unshaded. Site 3 exceeds all the guidelines, the highest for Cd is WHO, so is shaded blue. and so on.
I can shade exceedances against one guideline using the code below,
#example dataframe
df=pd.DataFrame(({'Cu':[0.004, 0.0017, 0.1], 'Cd': ['0.001','0.0005',1], 'SO4': [700,"450",1500],'Zn': ['0.15','0.1','0.25'],}))
cols=df.columns
Gives
Out[7]:
Cd Cu SO4 Zn
0 0.001 0.0040 700 0.15
1 0.0005 0.0017 450 0.1
2 1 0.1000 1500 0.25
#guideline values
WHO= {'Cd' :0.002 ,'Cu' :0.003 ,'SO4':"NaN",'Zn' :0.1}
ANZECC= {'Cd' :0.001 ,'Cu' :0.002 ,'SO4':1000.0,'Zn' :0.2}
US_EPA= {'Cd' :0.001 ,'Cu' :0.0015 ,'SO4':500.0,'Zn' :0.01}
#highlight the relevant cells that are greater than the guideline
def fill_exceedances(x):
color = 'orange'
#get columns which are in keys of dict
c = x.columns.intersection(ANZECC.keys())
#filter columns and rename by dict
df2 = x[c].rename(columns=ANZECC)
#create boolean mask only for matched columns and compare
mask = df2.astype(float).values < df2.columns[None,:].values
#new DataFrame filled by no color
df1 = pd.DataFrame('', index=x.index, columns=c)
#set color by mask and add missing non matched columns names by reindex
df1 = (df1.where(mask, 'background-color: {}'.format(color))
.reindex(columns=x.columns, fill_value=''))
return df1
df.style.apply(fill_exceedances, axis=None).to_excel('styled.xlsx', engine='openpyxl')
This gives, for the ANZECC guidelines:
I can't figure out how to apply successive styles as the previous styles are lost if I repeat the process. I feel I am approaching this the wrong way and successive styling from the minimum guideline upwards wont work.
UPDATE: I have put the guidelines into one dictionary, and can sort them and assign a colour based on the guideline in use, but I am stuck with how to proceed.
guidelines={"CEPA_FW": {'Sulphate':1000,'Cd' :0.01 ,'Cu' :1.0 ,'Pb' :0.005 ,'Zn' :5.0},
"ANZECC_Stock":{'Sulphate':1000,'Cd' :0.01,'Cu' :0.4, 'Zn' :20},
"ANZECC_FW": {'Sulphate':1000,'Cd' :0.0002 ,'Cu' :0.0014 ,'Pb' :0.0034 ,'Zn' :0.008}}
df=pd.DataFrame.from_records(guidelines).T
df
Gives:
Out[3]:
Cd Cu Pb Sulphate Zn
ANZECC_FW 0.0002 0.0014 0.0034 1000.0 0.008
ANZECC_Stock 0.0100 0.4000 NaN 1000.0 20.000
CEPA_FW 0.0100 1.0000 0.0050 1000.0 5.000
Then sort the guidelines and assign colour, with:
fill_color={'ANZECC_FW':'blue',
'CEPA_FW' :'green',
'ANZECC_Stock': 'yellow'}
for i in df.columns:
col=df[i]
col=col.sort_values(ascending=True)
#print col,i, col.idxmax()
#print i
#print col
count=0
for val in col:
if val>0:
print i,val,col.index[count],fill_color[col.index[count]]
count+=1
else:
continue
Gives:
Cd 0.0002 ANZECC_FW blue
Cd 0.01 ANZECC_Stock yellow
Cd 0.01 CEPA_FW green
Cu 0.0014 ANZECC_FW blue
Cu 0.4 ANZECC_Stock yellow
Cu 1.0 CEPA_FW green
Pb 0.0034 ANZECC_FW blue
Pb 0.005 CEPA_FW green
Sulphate 1000.0 ANZECC_FW blue
Sulphate 1000.0 ANZECC_Stock yellow
Sulphate 1000.0 CEPA_FW green
Zn 0.008 ANZECC_FW blue
Zn 5.0 CEPA_FW green
Zn 20.0 ANZECC_Stock yellow
So I have all the info, I just need to loop through these in order for each column in the data (if cell value >= list value, assign the fill color) which should work, but I am stuck with how to apply this.
First, few problems in your question:
Your data are not consistent: The code provided to generate df
contains some strings mixed with floats. If this really is the case in your data, the first thing you should do is converting all to float:
df = df.apply(pd.to_numeric)
Your guidelines
change throughout the post. For example:
You started with guidelines as
WHO= {'Cd' :0.002 ,'Cu' :0.003 ,'SO4':"NaN,'Zn' :0.1} # string `NaN` again?
ANZECC= {'Cd' :0.001 ,'Cu' :0.002 ,'SO4':1000.0,'Zn' :0.2}
US_EPA= {'Cd' :0.001 ,'Cu' :0.0015 ,'SO4':500.0,'Zn' :0.01}
Then, later on you build your guidelines
dataframe, which you must have mistakenly named df
as well, as:
# no string `NaN`
# Sulphate or SO4
# different standard names
guidelines={"CEPA_FW": {'Sulphate':1000,'Cd' :0.01 ,'Cu' :1.0 ,'Pb' :0.005 ,'Zn' :5.0},
"ANZECC_Stock":{'Sulphate':1000,'Cd' :0.01,'Cu' :0.4, 'Zn' :20},
"ANZECC_FW": {'Sulphate':1000,'Cd' :0.0002 ,'Cu' :0.0014 ,'Pb' :0.0034 ,'Zn' :0.008}}
This is totally different from the former.
That said, let's build a new guidelines
and fill_color
based on your picture:
guidelines = {'WHO' : {'Cd' :0.002 ,'Cu' :0.003 ,'SO4':np.NaN,'Zn' :0.1},
'ANZECC' : {'Cd' :0.001 ,'Cu' :0.002 ,'SO4':1000.0,'Zn' :0.2},
'US_EPA' : {'Cd' :0.001 ,'Cu' :0.0015 ,'SO4':500.0,'Zn' :0.01}
}
guidelines = pd.DataFrame(guidelines).T
fill_color = {'WHO':'teal',
'US_EPA' :'yellow',
'ANZECC': 'green'}
And here comes the function:
def hightlight(col):
name = col.name
# extract the threshold and sort decreasingly
thresh = guidelines[name].sort_values(ascending=False)
# compare each value in column to each threshold
compare = np.greater_equal.outer(col.values, thresh.values)
# if any threshold is exceeded
exceed_thresh = compare.any(1)
# and where it is exceeded
exceed_idx = np.argmax(compare, axis=1)
# extract the standards that has is passed
standards = np.where(exceed_thresh, thresh.index[exceed_idx], '')
# format strings
return [f'background-color:{fill_color[s]}' if s else '' for s in standards ]
df.style.apply(hightlight)
Output: