Search code examples
pythonpandasdataframestring-comparison

Comparing 2 values of same variable in single dataframe


I have a data frame as follow:

Obs. ID   Name type
  1) 123  abc  duplicate
  2) 123  abc  duplicate
  3) 145  abc  abc
  4) 156  abc  duplicate
  5) 156  abc  duplicate

if ID is same, like in obs. 1 and 2 or 4 and 5 then I want to create a new variable type=duplicate else type=vaule in Name variable(i.e abc)


Solution

  • We can use duplicated with np.where to set the values according to the result:

    df['type'] = np.where(df.duplicated('ID', False), 'Duplicate', 'Single')
    

    print(df)
    
      Obs.   ID Name       type
    0   1)  123  abc  Duplicate
    1   2)  123  abc  Duplicate
    2   3)  145  abc     Single
    3   4)  156  abc  Duplicate
    4   5)  156  abc  Duplicate
    

    For the update, you just need a simple tweek:

    df['type'] = np.where(~df.duplicated('ID', False), df.Name, 'Duplicate')
    
    print(df)
    
      Obs.   ID Name       type
    0   1)  123  abc  Duplicate
    1   2)  123  abc  Duplicate
    2   3)  145  abc        abc
    3   4)  156  abc  Duplicate
    4   5)  156  abc  Duplicate