Search code examples
pythonpandasdataframeunique

Find which column has unique values that can help distinguish the rows with Pandas


I have the following dataframe, which contains 2 rows:

index  name      food   color   number   year   hobby  music
0      Lorenzo   pasta  blue     5        1995  art    jazz
1      Lorenzo   pasta  blue     3        1995  art    jazz

I want to write a code that will be able to tell me which column is the one that can distinguish between the these two rows.
For example , in this dataframe, the column "number" is the one that distinguish between the two rows.

Unti now I have done this very simply by just go over column after column using iloc and see the values.

duplicates.iloc[:,3]
>>>
0  blue
1  blue

It's important to take into account that:

  1. This should be for loop, each time I check it on new generated dataframe.
  2. There may be nore than 2 rows which I need to check
  3. There may be more than 1 column that can distinguish between the rows.

I thought that the way to check such a thing will be something like take each time one column, get the unique values and check if they are equal to each other ,similarly to this:

for n in np.arange(0,len(df.columns)):
    tmp=df.iloc[:,n]

and then I thought to compare if all the values are similar to each other on the temporal dataframe, but here I got stuck because sometimes I have many rows and also I need.

My end goal: to be able to check inside for loop to identify the column that has different values in each row of the temporaldtaframe, hence can help to distinguish between the rows.


Solution

  • You can apply the duplicated method on all columns:

    s = df.apply(pd.Series.duplicated).any()
    
    s[~s].index
    

    Output: ['number']