Search code examples
rstringreplacenumbersrecode

Replace strings with values across multiple columns at once


I need to replace strings with numbers across multiple columns. Below is a sample data set:

x <- c("Low Outlier", "High Outlier", "Novice", "Novice", "Emerging", NA, "Proficient", "Approaching")
y <- c("Novice", "Approaching", "Proficient", "Approaching", "High Outlier", "Proficient",NA, "Emerging")
z <- c("High Outlier", "Proficient", "Approaching", "Emerging", "Low Outlier", "Approaching", "Approaching", "Emerging")

sam <- cbind(x,y,z)

I need to convert the "High/Low Outliers" to 0, The NA's to be left as NA, "Novice" to 1, "Emerging" to 2, "Approaching to 3, and "Proficient" to 4.

I have tried to convert a single variable with

sam$x.r <- recode(sam$x.r,'Low Outlier'=0,'High Outlier'=0,'Novice'=1,'Emerging'=2,'Approaching'=3, 'Proficient'=4)

I received an error message of "Warning message: In recode.numeric(Dat17_18.1$I.E.ScoreStat, Low Outlier = 0, High Outlier = 0, : NAs introduced by coercion"

I am not sure how to recode all of the variables at once.


Solution

  • Got really repetitive real quick. Here's a simple function:

    my_replacer<-function(df,y,z){    
    df<-as.data.frame(apply(df,2,function(x) gsub(y,z,x)))
        #y is what you want to replace
        #z is the replacement
        #This uses regex
          df
        }
        my_replacer(sam,"Emerging.*","2")
    

    Here is how I've used it:

    library(dplyr)#can use ifelse. Still repetitive
    
        sam<-as.data.frame(sam)
    
        sam %>% 
          mutate_if(is.factor,as.character)->sam
        my_replacer(sam,"Emerging.*","2")
    

    Result:

                   x            y            z
        1  Low Outlier       Novice High Outlier
        2 High Outlier  Approaching   Proficient
        3       Novice   Proficient  Approaching
        4       Novice  Approaching            2
        5            2 High Outlier  Low Outlier
        6         <NA>   Proficient  Approaching
        7   Proficient         <NA>  Approaching
        8  Approaching            2            2
    

    Replace others:

    my_replacer(sam,"Novi.*","1")
                 x            y            z
    1  Low Outlier            1 High Outlier
    2 High Outlier  Approaching   Proficient
    3            1   Proficient  Approaching
    4            1  Approaching     Emerging
    5     Emerging High Outlier  Low Outlier
    6         <NA>   Proficient  Approaching
    7   Proficient         <NA>  Approaching
    8  Approaching     Emerging     Emerging