Search code examples
rdataframeqsub

Replace cells not matched a list of string to NA for only for certain cols in R


I have a dataframe "abnormal2" which contains multiple cols, I cut some cols as an example;

   id class1 class2 class3 class4 class5
1   1 PATH   PATH   PATH   PATH   PATH
2   2 rLS          PATH     LB   <NA>
3   3 PATH   PATH   PATH   <NA>   <NA>
4   4 PATH    VUS    VUS   <NA>   <NA>
5   5 PATH    VUS    VUS   <NA>   <NA>
6   6 PATH   PATH    VUS   PATH   <NA>
7   7 MPATH    VUS    VUS   <NA>   <NA>
8   8 VUS    VUS    VUS   <NA>   <NA>
9   9 PATH    VUS    VUS   <NA>   <NA>
10  10 PATH   PATH          <NA>   <NA>

What I want to is replacing any cells that not matched a list of string (MPATH,VUS_LPATH,VUS_LB,PATH,VUS,LB,Normal) to NA. This is replacement is only for cols from class1 to class5; the results could be like this:

   id class1 class2 class3 class4 class5
1   1 PATH   PATH   PATH   PATH   PATH
2   2 NA    NA      PATH     LB   <NA>
3   3 PATH   PATH   PATH   <NA>   <NA>
4   4 PATH    VUS    VUS   <NA>   <NA>
5   5 PATH    VUS    VUS   <NA>   <NA>
6   6 PATH   PATH    VUS   PATH   <NA>
7   7 MPATH    VUS    VUS   <NA>   <NA>
8   8 VUS    VUS    VUS   <NA>   <NA>
9   9 PATH    VUS    VUS   <NA>   <NA>
10  10 PATH   PATH    NA     <NA>   <NA>

I used the codes below, but it is not working:

sel <- grepl("class",names(abnormal2))
abnormal2[sel] <- data.frame(lapply(abnormal2[sel], function(x) gsub([^MPATH|^VUS\\_LPATH|^VUS\\_LB|^PATH|^VUS|^LB|^Normal]","", x)))

Solution

  • If your string matches are exact (rather than requiring regex) then, using your idea as a basis, the following will work.

    sel <- grepl("class",names(abnormal2))
    
    matches <- c("MPATH", "VUS_LPATH", "VUS_LB", "PATH", "VUS", "LB", "Normal")
    
    abnormal2[sel] <- data.frame(lapply(abnormal2[sel], function(x) {
       x[!x %in% matches] <- NA
       x
    }), stringsAsFactors = F)