Search code examples
rselectcomparedelete-row

How to compare two factors in a single column in a dataframe based on the values in another column and delete them if they don't match


I am trying to compare two factors based on the values (date in this case) in another column. If they don't match I would like to delete the row.

example:

>head(data)
 light date
1 0    20190314
2 0    20190317
3 1    20190314
4 0    20190318
5 1    20190316
6 1    20190318
7 1    20190314

So I would like the result to be:

>head(data)

 light date
1 0    20190314
2 1    20190314
3 0    20190318
4 1    20190318
5 1    20190314

Thanks in advance


Solution

  • Here is one solution.

    Input

    tribble(~light, ~date,
    "0","20190314",
    "0","20190317",
    "1","20190314",
    "0","20190318",
    "1","20190316",
    "1","20190318",
    "1","20190314"
    ) ->d
    

    Code

    library(dplyr)
    d %>% group_by(date) %>% # group by date
      mutate(is_keep = if_else("0" %in% light & "1" %in% light, 1,0)) %>% # create a temporary column to keep track if date has both 0 and 1. 
      filter(is_keep==1) %>% # filter out rows to keep
      select(-is_keep) %>% # remove temp column
      ungroup() #ungroup df
    

    Output

      light date    
      <chr> <chr>   
    1 0     20190314
    2 1     20190314
    3 0     20190318
    4 1     20190318
    5 1     20190314