Search code examples
rstringlistdelete-rowlongitudinal

R: How to delete ID from a list of multiple strings in a longitudinal format


I had an earlier post regarding how to delete ID if any of the rows within ID contain certain strings (e.g., A or D) from the following data frame in a longitudinal format. These are R code examples that I received from the earlier post (r2evans, akrun, ThomasIsCoding) in order:

  1. d %>% group_by(id) %>% filter(!any(dx %in% c("A", "D"))) %>% ungroup()
  2. filter(d, !id %in% id[dx %in% c("A", "D")])
  3. subset(d, !ave(dx %in% c("A", "D"), id, FUN = any))

While these all worked well, I realized that I had to remove more than 600 strings (e.g., A, D, E2, F112, G203, etc), so I created a csv file for the list of these strings without a column name. 1. Is it the right approach to make a list? 2. How should I modify the above R codes if I intend to use the file of the strings list? Although I reviewed the other post or Google search results, I could not figure out what to do with my case. I would appreciate any suggestions!

Data frame:

id   time   dx
1     1     C
1     2     B
2     1     A
2     2     B
3     1     D
4     1     G203
4     2     E1

The results I want:

id    time  dx
 1     1     C
 1     2     B

UPDATE: Tarjae's below answer resolved the issue. The following are R codes for the solution.

my_list <- read.csv("my_list.csv")

columnname
    A
    D
    E2
    F112
    G203
  1. d %>% group_by(id) %>% filter(!any(dx%in%my_list$columnname)) %>% ungroup()
  2. filter(d, !id %in% id[dx %in% my_list$columnname])
  3. subset(d, !ave(dx %in% my_list$columnname, id, FUN = any))

Solution

  • This is a good strategy:

    Put your values in a vector or list here my_list then filter the dx column by negating by ! and using %in% operator:

    library(dplyr)
    
    my_list <- c("A", "D")
    
    df %>% 
      filter(!dx %in% my_list)
    
      id time   dx
    1  1    1    C
    2  1    2    B
    3  2    3    B
    4  4    1 G203
    5  4    1   E1
    

    Expanding the list of values: my_list <- c("A", "D", "G203", "E1")

    gives with the same code:

    library(dplyr)
    
    df %>% 
      filter(!dx %in% my_list)
    
      id time dx
    1  1    1  C
    2  1    2  B
    3  2    3  B