Search code examples
rdplyr

Filtering subjects based on having sample at specific time points


I have a dataframe with multiple samples collected from different individuals looking like

ID Subject Week
ID01 S01 Week_2
ID02 S01 Week_4
ID03 S01 Week_5
ID04 S02 Week_3
ID05 S03 Week_2
ID06 S03 Week_4
ID07 S04 Week_1
ID08 S04 Week_4
ID09 S04 Week_5

I want to filter out the subjects and samples without both Week_4 and Week_5 collection time points using dplyr to have

ID Subject Week
ID01 S01 Week_2
ID02 S01 Week_4
ID03 S01 Week_5
ID07 S04 Week_1
ID08 S04 Week_4
ID09 S04 Week_5

at the end.


Solution

  • You may do this with all in filter for each Subject :

    library(dplyr)
    keep_week <- c("Week_4", "Week_5")
    
    df %>% filter(all(keep_week %in% Week), .by = Subject)
    
    #    ID Subject   Week
    #1 ID01     S01 Week_2
    #2 ID02     S01 Week_4
    #3 ID03     S01 Week_5
    #4 ID07     S04 Week_1
    #5 ID08     S04 Week_4
    #6 ID09     S04 Week_5