Search code examples
python-3.xpandasdata-analysis

How to drop duplicate values of columns in excel using pandas with particular condition


I have table where i want to remove duplicate values from particular column and want single value for those duplicate values.

Input table:

col1  col2 col3  col4

afsdd  asdf  fff   d

b       f    m     d

dfdf    d    dd    d

df      ff    f    g

dd      dl    jf   g

ddd     dll   dfd  h

Desired output:

col1  col2 col3  col4

afsdd  asdf  fff   d

b       f    m     NA(empty)

dfdf    d    dd    NA(empty)

df      ff    f    g

dd      dl    jf   NA(empty)

ddd     dll   dfd  h

Here i want to remove duplicate column values by deleting or removing all the column values except first value and the duplicate column value should be removed in the sense that column should become empty.I don't want to drop entire row.

The below code i have tried:

df_final.drop_duplicates(subset ="col4")

but its removing all duplicates along with rows, but i want result in above mentioned format.


Solution

  • This is Series.mask with Series.duplicated:

    df_final['col4'] = df_final['col4'].mask(df_final['col4'].duplicated())
    
        col1  col2 col3 col4
    0  afsdd  asdf  fff    d
    1      b     f    m  NaN
    2   dfdf     d   dd  NaN
    3     df    ff    f    g
    4     dd    dl   jf  NaN
    5    ddd   dll  dfd    h