I have this dataframe:
import pandas as pd
data = {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
'salary': [200, 100, 300, 200,500,1000],
}
df = pd.DataFrame(data)
print(df)`
and i need add a new column with name 'is only' with values:
'true' if value set of columns ['name','surname','job'] is is unique or 'false' if value set of columns ['name','surname','job'] is is not unique
like:
data_answer= {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
'salary': [200, 100, 300, 200,500,1000],
'is only': ['false','true','true','true','true','false']
}
data_answer = pd.DataFrame(data_answer)
print(data_answer)
name surname job salary is only
0 Tom smith boxer 200 false
1 nick nielsen writer 100 true
2 krish hawk officer 300 true
3 jack boxer driver 200 true
4 Tom bless barman 500 true
5 Tom smith boxer 1000 false
can anyone help me find a solution?
There's a dedicated method for that: pandas.DataFrame.duplicated
df["is only"] = ~df.duplicated(subset=["name", "surname", "job"], keep=False)
The tilde (~
) inverts the result because you are interested in the opposite of duplicates.