python pandas duplicates data-cleaning data-comparison

How to compare rows show the reason of uniqueness in new column?

I have the following data:

	Name	Age	Country	Occupation	Hobby
0	A	23	DE	Job holder	Fishing
1	A	23	DE	Job holder	Gardening
2	A	23	DE	Job holder	Fishing
3	A	23	DE	Job holder	Reading
4	B	15	SW	Job holder	Fishing
5	B	15	SW	Job holder	Playing
6	C	23	DD	Job holder	Coding
7	B	23	AA	Job holder	Fishing
8	D	34	GH	Job holder	Fishing
9	D	33	TR	Other	Fishing

When there is a duplicate in the "Name" columns, I would like to compare them and I would like to know the reason for their uniqueness. I want to add one new column which column should contain the responsible column header names for this uniqueness. If a row is found one time then write 'Unique' in the reason column.

The output should look like this:

	Name	Age	Country	Occupation	Hobby	Reason
0	A	23	DE	Job holder	Fishing	Occupation, Hobby
1	A	23	DE	Job holder	Gardening	Occupation, Hobby
2	A	23	DE	Student	Fishing	Occupation, Hobby
3	A	23	DE	Job holder	Reading	Occupation, Hobby
4	B	15	SW	Job holder	Fishing	Hobby
5	B	15	SW	Job holder	Playing	Hobby
6	C	23	DD	Job holder	Coding	Unique
7	B	23	AA	Job holder	Fishing	Age, Country
8	D	34	GH	Job holder	Fishing	Age, Country, Occupation
9	D	33	TR	Other	Fishing	Age, Occupation, Occupation

I trying this way:

dif = [i for i, (x,y) in enumerate(zip(df.loc[0].values, df.loc[9,:].values)) if x!=y ] df.iloc[:, dif]

but it's not giving the way output should look like. I am new in python, any help would be appreciated!

Solution

You can identify the non-unique reasons, then merge as a string.

There are multiple ways to do this.

Using groupby.nunique and dot product:

df2 = df.groupby('Name').nunique()
df['Reason'] = df['Name'].map(df2.gt(1).dot(df2.columns+', ')
                                 .str[:-2].replace('', 'Unique'))

With groupby.nunique and melt:

df2 = df.melt('Name')
df['Reason'] = (df['Name']
                .map(df2.groupby(['Name', 'variable'], as_index=False)['value'].nunique()
                        .query('value > 1').groupby('Name')['variable'].agg(', '.join))
                .fillna('Unique')
               )

With groupby.apply:

df['Reason'] = df['Name'].map(df.groupby('Name')
                                .apply(lambda g: ', '.join(g.columns[g.nunique().gt(1)]))
                                .replace('', 'Unique')
                             )

Output:

  Name  Age Country  Occupation      Hobby                    Reason
0    A   23      DE  Job holder    Fishing         Occupation, Hobby
1    A   23      DE  Job holder  Gardening         Occupation, Hobby
2    A   23      DE     Student    Fishing         Occupation, Hobby
3    A   23      DE  Job holder    Reading         Occupation, Hobby
4    B   15      SW  Job holder    Fishing       Age, Country, Hobby
5    B   15      SW  Job holder    Playing       Age, Country, Hobby
6    C   23      DD  Job holder     Coding                    Unique
7    B   23      AA  Job holder    Fishing       Age, Country, Hobby
8    D   34      GH  Job holder    Fishing  Age, Country, Occupation
9    D   33      TR       Other    Fishing  Age, Country, Occupation