Search code examples
rdata-manipulation

R: grepl with NA's


I am working with the R programming language.

I have the following dataset:

file = data.frame(id = c(1,2,3,4,5), col1 = c("Red", "Blue", "CCC", "Yellow", "Orange"), col2 = c("AAA", "BBB", "CCC", "DDD", "Red"))


  id   col1 col2
1  1    Red  AAA
2  2   Blue  BBB
3  3    CCC  CCC
4  4 Yellow  DDD
5  5 Orange  Red

For all cells that contain %LIKE% "CCC" or %LIKE% "Red", I would like to replace them with NA. The end result should look something like this:

  id   col1 col2
1  1    NA  AAA
2  2   Blue  BBB
3  3   NA   NA
4  4 Yellow  DDD
5  5 Orange  NA

I found a similar post (Replace entire expression that contains a specific string) and tried to apply the logic presented there to my question:

step1 = file[grep("CCC", file)] <- "NA"
step2 = step1[grep("Red", step1)] <- "NA"

However, I don't think this is working - all I get is an "NA" output.

Can someone please show me how to fix this problem?


Solution

  • I would use ifelse along with %in%:

    file$col1 <- ifelse(file$col1 %in% c("CCC", "Red"), NA, file$col1)
    file$col2 <- ifelse(file$col2 %in% c("CCC", "Red"), NA, file$col2)
    

    For a substring match, use grepl:

    <!-- language: r -->
    
    file$col1 <- ifelse(grepl("CCC|Red", file$col1), NA, file$col1)