Search code examples
python-3.xpandasjupyter-notebookduplicateshighlight

Pandas style applymap highlight duplicates with lambda function


I have a Pandas dataframe and am working in a Jupyter notebook. I want to highlight rows in which column pairs are duplicated. Here is an example:

colA = list(range(1,6))
colB = ['aa', 'bb', 'aa', 'cc', 'aa']
colC = [14,3,14,9,12]
colD = [108, 2001, 152, 696, 696]
df = pd.DataFrame(list(zip(colA, colB, colC, colD)), columns =['colA', 'colB', 'colC', 'colD']) 
display(df)

enter image description here

I want to highlight these rows, because the values in colB and colC are duplicates:

enter image description here

I am trying this lambda function, but it throws an error (and it's only for one column):

df.style.applymap(lambda x: 'background-color : yellow' if x[colB].duplicated(keep=False) else '')
TypeError: ("'int' object is not subscriptable", 'occurred at index colA')

Thanks for any help


Solution

  • Personally, I would break the problem into two steps rather than use one complicated lambda function. We can find the index of all the duplicate rows, then highlight the rows by index number. Also don't forget that in your lambda function, you should use a list comprehension in what you are returning.

    rows_series = df[['colB','colC']].duplicated(keep=False)
    rows = rows_series[rows_series].index.values
    df.style.apply(lambda x: ['background: yellow' if x.name in rows else '' for i in x], axis=1)
    

    enter image description here