Search code examples
rregexgsubnegation

Negation of gsub | Replace everything except strings in a certain vector


I have a vector of strings:

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")

I want to keep only three possible values in this vector: N, A, and NA.

Therefore, I want to replace any element that is NOT N or A with NA.

How can I achieve this?

I have tried the following:

gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')

But these don't work well, because they replace every instance of "A" or "N" in every string with NA. So in some cases I end up with NANANANANANA, instead of simply NA.


Solution

  • If we are looking for fixed matches, then use %in% with negation ! and assign it to 'NA'

    ve[!ve %in% c("A", "N", "NA")] <- 'NA'
    

    Note that in R, missing value is unquoted NA and not quoted. Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing