Search code examples
pythonpandasstyles

Applying pandas styles to arbitrary (non-product) subsets of a dataframe


How does one apply a style to an arbitrary subset of a pandas dataframe? Specifically, I have a dataframe df that contains some NaNs, and I want to apply a background gradient to it everywhere except where there are NaNs (with the same colormap applied to all cells).

I know that background_gradient (and applymap more generally) has a subset parameter, but I do not understand from the documentation how to use it to select an arbitrary subset of the dataframe.

import numpy as np
import pandas as pd

df = pd.DataFrame(data={'A': [0, 1, np.nan], 'B': [.5, np.nan, 0], 'C': [np.nan, 1, 1]})
mask = ~pd.isnull(df)

Then if I try

df.style.background_gradient(subset=mask)

I get the error:

IndexingError: Too many indexers

I know how to apply a style to a subset of a dataframe in the specific case where that subset is a Cartesian product of indices and columns, using something like the solution here: How do I style a subset of a pandas dataframe?. So the question is what to do when the subset is not such a product, as in the example above.

One solution might be to loop through the columns and apply the style column-by-column (then each application is to a Cartesian product subset). In my case, I can pass low and high parameters to the background_gradient method to force the colormaps to match up between columns, but that fails when (as above) one or more of those columns contains a unique non-NaN value. This in turn could be bypassed by rewriting the background_gradient function, but that's clearly undesirable.


Solution

  • You can write a custom function for this:

    from matplotlib.cm import get_cmap
    cmap = get_cmap('PuBu')
    
    # update with low-high option
    def threshold(x,low=0,high=1,mid=0.5):
        # nan cell
        if np.isnan(x): return ''
    
        # non-nan cell
        x = (x-low)/(high-low)
        background = f'background-color: rgba{cmap (x, bytes=True)}'
        text_color = f'color: white' if x > mid else ''
        return background+';'+text_color
    
    # apply the style
    df.style.applymap(threshold, low=-1, high=1, mid=0.3)
    

    Output:

    enter image description here