Search code examples
rstringdplyrpullgrepl

pull subject ids after detecting characters matching a string


Please help me pull subject id's after determining a list of participants who do not contain specified characters. e.g:

data:

df <- structure (list(subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365"), edta_codes = c("4EDTA-3M783316", "4EDTA-3M2897865", "4EDTA-M280934", "4EDTA-3M286549","MCF -3M289684", "NA")), class = "data.frame", row.names = c (NA, -6L))

Code to test if character is in string:

df$edta_codes[!grepl("4EDTA-3", df$edta_codes)]

Different method:

str_detect(df$edta_codes,"4EDTA-3")

Both give me the result I want but from here I want to show the subject ids that do not have the specified string, including those with NA (i.e. in this case - 191-3457, 191-1245, 191-2365 are all different from the specified characters). I have tried using pull after each of the above codes and they both did not work.

Please help.


Solution

  • You can simply do,

    df[!grepl("4EDTA-3", df$edta_codes),'subject_id']
    #[1] "191-3457" "191-1245" "191-2365"
    

    If you want to return also the codes, then,

    df[!grepl("4EDTA-3", df$edta_codes),]
    
    #  subject_id    edta_codes
    #3   191-3457 4EDTA-M280934
    #5   191-1245 MCF -3M289684
    #6   191-2365            NA