Search code examples
rduplicatesrows

How to delete only consecutive duplicate rows?


I need to delete all duplicates in my data frame ONLY when they come in consecutive rows. I tried the distinct() function, but that deletes all duplicates - so I need a different code that gives me the opportunity to customize and say delete only when the duplicates are consecutive and that only for a specific column.

Here is an example of my data:

 Subject  Trial Event_type  Code   Time 
    
23  VP02_RP 15  Picture face01_n    887969

24  VP02_RP 15  Sound   mpossound_test5 888260

25  VP02_RP 15  Picture pospic_test5    906623

26  VP02_RP 15  Nothing ev_mnegpos_adj_onset    928623

27  VP02_RP 15  Response    15  958962

28  VP02_RP 18  Picture face01_p    987666

29  VP02_RP 18  Sound   mpossound_test6 987668

30  VP02_RP 18  Picture negpic_test6    1006031

31  VP02_RP 18  Nothing ev_mposnegpos_adj_onset 1028031

32  VP02_RP 18  Response    15  1076642

33  VP02_RP 19  Response    13  1680887

As you can see in rows 32 & 33 I have two consecutive Responses and I only want to keep the first one. So I want to delete all duplicate consecutive rows on my Event_type column.

How should I go about this?


Solution

  • You can use rleid function from data.table which will give you a unique number for every consecutive event values, then using duplicated keep only the first one.

    res <- df[!duplicated(data.table::rleid(df$Event_type)), ]
    res
    
    #   Subject Trial Event_type                    Code    Time
    #23 VP02_RP    15    Picture                face01_n  887969
    #24 VP02_RP    15      Sound         mpossound_test5  888260
    #25 VP02_RP    15    Picture            pospic_test5  906623
    #26 VP02_RP    15    Nothing    ev_mnegpos_adj_onset  928623
    #27 VP02_RP    15   Response                      15  958962
    #28 VP02_RP    18    Picture                face01_p  987666
    #29 VP02_RP    18      Sound         mpossound_test6  987668
    #30 VP02_RP    18    Picture            negpic_test6 1006031
    #31 VP02_RP    18    Nothing ev_mposnegpos_adj_onset 1028031
    #32 VP02_RP    18   Response                      15 1076642
    

    rleid function in base R can be written with rle -

    res <- df[!duplicated(with(rle(df$Event_type),rep(seq_along(values), lengths))),]
    res