Search code examples
pythonpandasunique

how in pandas mark if a set of column is unique or not?


I have this dataframe:

import pandas as pd
data = {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
        'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
        'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
        'salary': [200, 100, 300, 200,500,1000],
        }

df = pd.DataFrame(data)

print(df)`

and i need add a new column with name 'is only' with values:

'true' if value set of columns ['name','surname','job'] is is unique or 'false' if value set of columns ['name','surname','job'] is is not unique

like:

data_answer= {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
        'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
        'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
        'salary': [200, 100, 300, 200,500,1000],
        'is only': ['false','true','true','true','true','false']
        }

data_answer = pd.DataFrame(data_answer)
print(data_answer)

    name  surname      job  salary is only
0    Tom    smith    boxer     200   false
1   nick  nielsen   writer     100    true
2  krish     hawk  officer     300    true
3   jack    boxer   driver     200    true
4    Tom    bless   barman     500    true
5    Tom    smith    boxer    1000   false

can anyone help me find a solution?


Solution

  • There's a dedicated method for that: pandas.DataFrame.duplicated

    df["is only"] = ~df.duplicated(subset=["name", "surname", "job"], keep=False)
    

    The tilde (~) inverts the result because you are interested in the opposite of duplicates.