Search code examples
pythonpandascomparerow

Pandas comparing two rows in a database


I have a dataframe like this;

df = pd.DataFrame(np.array([['apple', 'golden', 3], ['apple', 'green', 6], ['banana', 'golden', 9], ['apple', 'golden', 5], ['apple', 'green', 6], ['banana', 'golden', 6]]),
                   columns=['Column1', 'Column2', 'Column3'])
df

    Column1 Column2 Column3
0   apple   golden  3
1   apple   green   6
2   banana  golden  9
3   apple   golden  5
4   apple   green   6
5   banana  golden  6

I want to compare "Column1" rows with iterating in a new Column4. If there is a difference I want to write down True, if not False.

    Column1 Column2 Column3 Column4
0   apple   golden  3       False
1   apple   green   6       False
2   banana  golden  9       True
3   apple   golden  5       True
4   apple   green   6       False
5   banana  golden  6       True

And lastly, if comparing result is true, I want to add Column1 item to a list.

list = ['banana']

Solution

  • Compare shifted values for not equal with replace first value to original Column1 by fillna:

    df['Column4'] = df.Column1.shift().fillna(df.Column1).ne(df.Column1)
    
    print (df)
      Column1 Column2  Column3  Column4
    0   apple  golden        3    False
    1   apple   green        6    False
    2  banana  golden        9     True
    3   apple  golden        5     True
    4   apple   green        6    False
    5  banana  golden        6     True
    

    For list dont use list, because python code word:

    L = df.loc[df['Column4'], 'Column1'].unique().tolist()
    print (L)
    ['banana', 'apple']