Search code examples
pythonpandasdataframe

How to delete cell values in row based on condition of a cell above in a different column?


The data frame I am working with has multiple rows with duplicated information in the 'Data 1', 'Data 2', and 'Data 3' columns if the 'Time' column is equal to the row above. There is unique information in each row's 'Message 1' and 'message 2' columns, so I don't want to delete the whole row. I would like to keep the row of the data frame but delete the duplicate info based on if the 'Time' Column matches the row above, keeping the first instance of the data, and ideally replacing the duplicate values with an empty string ''.

Here is an example of my data set:

Time Message 1 Message 2 Data 1 Data 2 Data 3
11:38:24 Notice Text A1 A2 A3
11:38:25 Error Text B1 B2 B3
11:38:26 Message Text C1 C2 C3
11:38:27 Error Text D1 D2 D3
11:38:27 Notice Text D1 D2 D3
11:38:27 Error Text D1 D2 D3
11:38:28 Message Text E1 E2 E3
11:38:28 Notice Text E1 E2 E3
11:38:28 Error Text E1 E2 E3

I have tried the following, but I receive errors "KeyError: 'cannot use a single bool to index into setitem'"

condition = df['Time'].shift(1).str == ['Time']
df.loc[condition, 'Data 1'] = ''
df.loc[condition, 'Data 2'] = ''
df.loc[condition, 'Data 3'] = ''

My goal is for the data frame to reflect the following:

Time Message 1 Message 2 Data 1 Data 2 Data 3
11:38:24 Notice Text A1 A2 A3
11:38:25 Error Text B1 B2 B3
11:38:26 Message Text C1 C2 C3
11:38:27 Error Text D1 D2 D3
11:38:27 Notice Text
11:38:27 Error Text
11:38:28 Message Text E1 E2 E3
11:38:28 Notice Text
11:38:28 Error Text

Solution

  • you can use the shift() method to compare each 'Time' value with the value from the previous row

    condition = df['Time'] == df['Time'].shift()
    df.loc[condition, ['Data 1', 'Data 2', 'Data 3']] = ''
    print(df)
    

    df['Time'] == df['Time'].shift() evaluates to True for rows where 'Time' value is the same as previous row.