Highlight the cells in different color if not exact dup

I have this dataframe. If the Description is the same then the job entry should be exactly the same.

mycol = ['Title', 'Location', 'Company', 'Salary', 'Sponsored', 'Description']
mylist=[('a', 'b', 'c', 'd', 'e', 'f'),
('a', 'b', 'c', 'd2', 'e', 'f'),
('g', 'h', 'i', 'j', 'k', 'l' ),
('g1', 'h', 'i', 'j', 'k', 'l' ),
('n', 'o', 'p', 'q', 'r', 's'),
('n1', 'o', 'p', 'q', 'r', 's')
]

df = pd.DataFrame(mylist, columns = mycol)

I will like to highlight the differences in yellow background as shown in this image...

Is it possible in pandas?

Or I can export in excel and process it using VBA. I am trying to achieve this in pandas and then export to excel along with formatting.

Update:

Someone suggested to use this:

# Select all Columns but Description
cols = df.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
df.loc[df['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
df = df.ffill()

But it will replace the values and not highlight it.

Solution

We can clear the rows where the description is duplicated, then use groupby ffill to fill the values forward per Description:

mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
mask.loc[mask['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df['Description'].values).ffill()

mask:

  Title Location Company Salary Sponsored Description
0     a        b       c      d         e           f
1     a        b       c      d         e           f
2     g        h       i      j         k           l
3     g        h       i      j         k           l
4     n        o       p      q         r           s
5     n        o       p      q         r           s

This can then become our point to compare against:

styles = (
    # Remove Where values are incorrect
    mask.where(mask.ne(df))
        # Back fill per group
        .groupby(df['Description'].values).bfill()
        # Anywhere values are not null
        .notnull()
        # Replace booleans with styling
        .replace({True: 'background-color: yellow;', False: ''})
)

df.style.apply(lambda _: styles, axis=None)

where and groupby bfill give us:

mask.where(mask.ne(df)).groupby(df['Description'].values).bfill()

  Title Location Company Salary Sponsored Description
0   NaN      NaN     NaN      d       NaN         NaN
1   NaN      NaN     NaN      d       NaN         NaN
2     g      NaN     NaN    NaN       NaN         NaN
3     g      NaN     NaN    NaN       NaN         NaN
4     n      NaN     NaN    NaN       NaN         NaN
5     n      NaN     NaN    NaN       NaN         NaN

Then notnull and replace allow styles to be set: styles:

                       Title Location Company                     Salary Sponsored Description
0                                              background-color: yellow;                      
1                                              background-color: yellow;                      
2  background-color: yellow;                                                                  
3  background-color: yellow;                                                                  
4  background-color: yellow;                                                                  
5  background-color: yellow;

Remember to write to_excel from the Styler object not the DataFrame:

df.style.apply(lambda _: styles, axis=None).to_excel('out.xlsx')