Search code examples
rdataframesubsetdelete-row

Delete all rows based on corresponding values in multiple columns


Edited question:

I would like to subset/filter a new dataframe based on multiple conditions. I tried the following code mentioned here (Subset data frame based on multiple conditions) and (Remove group from data.frame if at least one group member meets condition)

A small portion of total database:

df<- structure(list(pat_id = c(10302, 10302, 10302, 
                          10482, 10482,10482,
                          10613, 10613, 10613, 
                          16190, 16190, 16190, 
                          16220, 16220,16220, 16220, 16220, 16220, 16220, 16220), 
               date = c("2014-04-22","2018-12-13", "2020-07-27", "2019-07-15", "2019-09-19", "2019-09-23", 
                         "2015-09-29", "2015-10-06", "2015-11-20", "2013-07-08", "2018-01-30", 
                         "2020-01-09", "2016-06-15", "2018-02-23", "2019-02-14", "2019-08-09", 
                         "2020-03-02", "2020-07-03", "2020-11-09", "2020-12-16"), 
               number = c(1,2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 8), 
               col1 = c(0,1, 1, 2, 4, 4, 9, 3, 1, 0, 1, 1, 9, 9, 9, 9, 9, 9, 9, 9), 
               col2 = c(NA_real_,NA_real_, NA_real_, 0, 1, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
               col3 = c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), 
                class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
               pat_id = c(10302, 10482, 10613, 16190, 16220), .rows = structure(list(
                        1:3, 4:6, 7:9, 10:12, 13:20), ptype = integer(0), class = c("vctrs_list_of", 
                        "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
                        ), row.names = c(NA, -5L), .drop = TRUE))

I want to create a new dataframe based on the following conditions.

If the number is 1 or 2 AND col1, col2 or col3 is 1, then delete all the rows with the corresponding id value.

Desired output:

id        date    number    col1     col2     col3
10613      ..      1         9        NA       NA
10613      ..      2         3        NA       NA
10613      ..      3         1        NA       NA
etc

I've tried df1 <- df %>% group_by(pat_id) %>% filter(any(!(number <= 2 & (col1 == 1 | col2==1 | col3==1))))

But this does not seem to work. Could it be because of the class/structure of the dataframe? I cant figure it out. If i create a 'dummy' dataframe with similar columns this code does work. But not on the big dataset.

Any tips?


Solution

  • First of all, make sure your number columns are numeric. After that you can group_by per id and filter if all numbers are true based on your condition like this:

    library(dplyr)
    
    df %>%
      group_by(id) %>%
      filter(all(number > 1))
    #> # A tibble: 3 × 2
    #> # Groups:   id [2]
    #>   id    number
    #>   <chr>  <dbl>
    #> 1 12         2
    #> 2 13         2
    #> 3 13         3
    

    Created on 2023-08-16 with reprex v2.0.2


    Data used:

    id <- c('10','10','10','11', '11', '12', '13', '13', '14', '15', '15')
    number <- c(1, 2,3, 1, 2, 2, 2, 3,1 ,1,2)
    df <- data.frame(id, number)