Search code examples
pythonpandasstyles

Highlight values based on multiple criteria


I want to shade the values (cells) that equal or exceed the guideline values (currently in a dictionary, key (chemical symbol):value (guideline limit). I want it to deal with multiple sets of guidelines and shades the cell according to the guideline with the highest value that is being exceeded. The table below shows the desired output, with the guidelines assigned a color, and the exceedances of each guide assigned the same colour.

enter image description here

This shows the limits for three different guidelines (ANZECC, WHO, and US EPA) for Cu, Cd, SO4 and Zn. The results are below (site 1 -3) and are highlighted according to which (if any) guidelines they exceed. So for the first parameter, Cu, site 2 only exceeds the US EPA value so is shaded yellow. Site 1 all the guidelines the highest is the WHO value so is shaded blue. Site 3 also exceeds all the guidelines, the maximum is WHO, so is shaded blue as well. Cd for site 1 exceeds US EPA and ANZECC (which are the same) so is shaded green (but could have been shaded yellos as the US EPA limit is the same). Site 2 doesn't exceed any of the guidelines so is left unshaded. Site 3 exceeds all the guidelines, the highest for Cd is WHO, so is shaded blue. and so on.

I can shade exceedances against one guideline using the code below,

#example dataframe
df=pd.DataFrame(({'Cu':[0.004, 0.0017, 0.1],  'Cd': ['0.001','0.0005',1], 'SO4': [700,"450",1500],'Zn': ['0.15','0.1','0.25'],}))
cols=df.columns

Gives

Out[7]:
      Cd      Cu   SO4    Zn
0   0.001  0.0040   700  0.15
1  0.0005  0.0017   450   0.1
2       1  0.1000  1500  0.25

#guideline values
WHO=  {'Cd' :0.002 ,'Cu' :0.003 ,'SO4':"NaN",'Zn' :0.1}
ANZECC=  {'Cd' :0.001 ,'Cu' :0.002 ,'SO4':1000.0,'Zn' :0.2}
US_EPA=  {'Cd' :0.001 ,'Cu' :0.0015 ,'SO4':500.0,'Zn' :0.01}

#highlight the relevant cells that are greater than the guideline
def fill_exceedances(x):
    color = 'orange'
    #get columns which are in keys of dict
    c = x.columns.intersection(ANZECC.keys())
    #filter columns and rename by dict
    df2 = x[c].rename(columns=ANZECC)
    #create boolean mask only for matched columns and compare
    mask = df2.astype(float).values < df2.columns[None,:].values
    #new DataFrame filled by no color
    df1 = pd.DataFrame('', index=x.index, columns=c)
    #set color by mask and add missing non matched columns names by reindex
    df1 = (df1.where(mask, 'background-color: {}'.format(color))
              .reindex(columns=x.columns, fill_value=''))

    return df1

df.style.apply(fill_exceedances, axis=None).to_excel('styled.xlsx', engine='openpyxl')

This gives, for the ANZECC guidelines:

enter image description here

I can't figure out how to apply successive styles as the previous styles are lost if I repeat the process. I feel I am approaching this the wrong way and successive styling from the minimum guideline upwards wont work.

UPDATE: I have put the guidelines into one dictionary, and can sort them and assign a colour based on the guideline in use, but I am stuck with how to proceed.

guidelines={"CEPA_FW":  {'Sulphate':1000,'Cd' :0.01 ,'Cu' :1.0 ,'Pb' :0.005 ,'Zn' :5.0}, 
 "ANZECC_Stock":{'Sulphate':1000,'Cd' :0.01,'Cu' :0.4, 'Zn' :20},
 "ANZECC_FW":  {'Sulphate':1000,'Cd' :0.0002 ,'Cu' :0.0014 ,'Pb' :0.0034 ,'Zn' :0.008}}


df=pd.DataFrame.from_records(guidelines).T
df

Gives:

Out[3]:
                  Cd      Cu      Pb  Sulphate      Zn
ANZECC_FW     0.0002  0.0014  0.0034    1000.0   0.008
ANZECC_Stock  0.0100  0.4000     NaN    1000.0  20.000
CEPA_FW       0.0100  1.0000  0.0050    1000.0   5.000

Then sort the guidelines and assign colour, with:

fill_color={'ANZECC_FW':'blue',
'CEPA_FW' :'green',
'ANZECC_Stock': 'yellow'}

for i in df.columns:
    col=df[i]
    col=col.sort_values(ascending=True)
    #print col,i, col.idxmax()
    #print i
    #print col
    count=0
    for val in col:

        if val>0:

            print i,val,col.index[count],fill_color[col.index[count]]

            count+=1
        else:
            continue

Gives:


Cd 0.0002 ANZECC_FW blue
Cd 0.01 ANZECC_Stock yellow
Cd 0.01 CEPA_FW green
Cu 0.0014 ANZECC_FW blue
Cu 0.4 ANZECC_Stock yellow
Cu 1.0 CEPA_FW green
Pb 0.0034 ANZECC_FW blue
Pb 0.005 CEPA_FW green
Sulphate 1000.0 ANZECC_FW blue
Sulphate 1000.0 ANZECC_Stock yellow
Sulphate 1000.0 CEPA_FW green
Zn 0.008 ANZECC_FW blue
Zn 5.0 CEPA_FW green
Zn 20.0 ANZECC_Stock yellow

So I have all the info, I just need to loop through these in order for each column in the data (if cell value >= list value, assign the fill color) which should work, but I am stuck with how to apply this.


Solution

  • First, few problems in your question:

    1. Your data are not consistent: The code provided to generate df contains some strings mixed with floats. If this really is the case in your data, the first thing you should do is converting all to float:

      df = df.apply(pd.to_numeric)

    2. Your guidelines change throughout the post. For example:

    You started with guidelines as

    WHO=  {'Cd' :0.002 ,'Cu' :0.003 ,'SO4':"NaN,'Zn' :0.1}       # string `NaN` again?
    ANZECC=  {'Cd' :0.001 ,'Cu' :0.002 ,'SO4':1000.0,'Zn' :0.2}
    US_EPA=  {'Cd' :0.001 ,'Cu' :0.0015 ,'SO4':500.0,'Zn' :0.01}
    

    Then, later on you build your guidelines dataframe, which you must have mistakenly named df as well, as:

    # no string `NaN`
    # Sulphate or SO4
    # different standard names
    guidelines={"CEPA_FW":  {'Sulphate':1000,'Cd' :0.01 ,'Cu' :1.0 ,'Pb' :0.005 ,'Zn' :5.0}, 
     "ANZECC_Stock":{'Sulphate':1000,'Cd' :0.01,'Cu' :0.4, 'Zn' :20},
     "ANZECC_FW":  {'Sulphate':1000,'Cd' :0.0002 ,'Cu' :0.0014 ,'Pb' :0.0034 ,'Zn' :0.008}}
    

    This is totally different from the former.


    That said, let's build a new guidelines and fill_color based on your picture:

    guidelines = {'WHO' :  {'Cd' :0.002 ,'Cu' :0.003 ,'SO4':np.NaN,'Zn' :0.1},
                  'ANZECC' : {'Cd' :0.001 ,'Cu' :0.002 ,'SO4':1000.0,'Zn' :0.2},
                  'US_EPA' :  {'Cd' :0.001 ,'Cu' :0.0015 ,'SO4':500.0,'Zn' :0.01}
                  }
    guidelines = pd.DataFrame(guidelines).T
    
    fill_color = {'WHO':'teal',
                  'US_EPA' :'yellow',
                  'ANZECC': 'green'}
    

    And here comes the function:

    def hightlight(col):
        name = col.name
    
        # extract the threshold and sort decreasingly
        thresh = guidelines[name].sort_values(ascending=False)
    
        # compare each value in column to each threshold
        compare = np.greater_equal.outer(col.values, thresh.values)
    
        # if any threshold is exceeded
        exceed_thresh = compare.any(1)
    
        # and where it is exceeded
        exceed_idx = np.argmax(compare, axis=1)
    
        # extract the standards that has is passed
        standards = np.where(exceed_thresh, thresh.index[exceed_idx], '')
    
        # format strings
        return [f'background-color:{fill_color[s]}' if s else '' for s in standards ]
    
    df.style.apply(hightlight)
    

    Output:

    enter image description here