The data frame I am working with has multiple rows with duplicated information in the 'Data 1', 'Data 2', and 'Data 3' columns if the 'Time' column is equal to the row above. There is unique information in each row's 'Message 1' and 'message 2' columns, so I don't want to delete the whole row. I would like to keep the row of the data frame but delete the duplicate info based on if the 'Time' Column matches the row above, keeping the first instance of the data, and ideally replacing the duplicate values with an empty string ''.
Here is an example of my data set:
Time | Message 1 | Message 2 | Data 1 | Data 2 | Data 3 |
---|---|---|---|---|---|
11:38:24 | Notice | Text | A1 | A2 | A3 |
11:38:25 | Error | Text | B1 | B2 | B3 |
11:38:26 | Message | Text | C1 | C2 | C3 |
11:38:27 | Error | Text | D1 | D2 | D3 |
11:38:27 | Notice | Text | D1 | D2 | D3 |
11:38:27 | Error | Text | D1 | D2 | D3 |
11:38:28 | Message | Text | E1 | E2 | E3 |
11:38:28 | Notice | Text | E1 | E2 | E3 |
11:38:28 | Error | Text | E1 | E2 | E3 |
I have tried the following, but I receive errors "KeyError: 'cannot use a single bool to index into setitem'"
condition = df['Time'].shift(1).str == ['Time']
df.loc[condition, 'Data 1'] = ''
df.loc[condition, 'Data 2'] = ''
df.loc[condition, 'Data 3'] = ''
My goal is for the data frame to reflect the following:
Time | Message 1 | Message 2 | Data 1 | Data 2 | Data 3 |
---|---|---|---|---|---|
11:38:24 | Notice | Text | A1 | A2 | A3 |
11:38:25 | Error | Text | B1 | B2 | B3 |
11:38:26 | Message | Text | C1 | C2 | C3 |
11:38:27 | Error | Text | D1 | D2 | D3 |
11:38:27 | Notice | Text | |||
11:38:27 | Error | Text | |||
11:38:28 | Message | Text | E1 | E2 | E3 |
11:38:28 | Notice | Text | |||
11:38:28 | Error | Text |
you can use the shift() method to compare each 'Time' value with the value from the previous row
condition = df['Time'] == df['Time'].shift()
df.loc[condition, ['Data 1', 'Data 2', 'Data 3']] = ''
print(df)
df['Time'] == df['Time'].shift() evaluates to True for rows where 'Time' value is the same as previous row.