I need to delete all duplicates in my data frame ONLY when they come in consecutive rows. I tried the distinct() function, but that deletes all duplicates - so I need a different code that gives me the opportunity to customize and say delete only when the duplicates are consecutive and that only for a specific column.
Here is an example of my data:
Subject Trial Event_type Code Time
23 VP02_RP 15 Picture face01_n 887969
24 VP02_RP 15 Sound mpossound_test5 888260
25 VP02_RP 15 Picture pospic_test5 906623
26 VP02_RP 15 Nothing ev_mnegpos_adj_onset 928623
27 VP02_RP 15 Response 15 958962
28 VP02_RP 18 Picture face01_p 987666
29 VP02_RP 18 Sound mpossound_test6 987668
30 VP02_RP 18 Picture negpic_test6 1006031
31 VP02_RP 18 Nothing ev_mposnegpos_adj_onset 1028031
32 VP02_RP 18 Response 15 1076642
33 VP02_RP 19 Response 13 1680887
As you can see in rows 32 & 33 I have two consecutive Responses and I only want to keep the first one. So I want to delete all duplicate consecutive rows on my Event_type column.
How should I go about this?
You can use rleid
function from data.table
which will give you a unique number for every consecutive event values, then using duplicated
keep only the first one.
res <- df[!duplicated(data.table::rleid(df$Event_type)), ]
res
# Subject Trial Event_type Code Time
#23 VP02_RP 15 Picture face01_n 887969
#24 VP02_RP 15 Sound mpossound_test5 888260
#25 VP02_RP 15 Picture pospic_test5 906623
#26 VP02_RP 15 Nothing ev_mnegpos_adj_onset 928623
#27 VP02_RP 15 Response 15 958962
#28 VP02_RP 18 Picture face01_p 987666
#29 VP02_RP 18 Sound mpossound_test6 987668
#30 VP02_RP 18 Picture negpic_test6 1006031
#31 VP02_RP 18 Nothing ev_mposnegpos_adj_onset 1028031
#32 VP02_RP 18 Response 15 1076642
rleid
function in base R can be written with rle
-
res <- df[!duplicated(with(rle(df$Event_type),rep(seq_along(values), lengths))),]
res