I need to replace strings with numbers across multiple columns. Below is a sample data set:
x <- c("Low Outlier", "High Outlier", "Novice", "Novice", "Emerging", NA, "Proficient", "Approaching")
y <- c("Novice", "Approaching", "Proficient", "Approaching", "High Outlier", "Proficient",NA, "Emerging")
z <- c("High Outlier", "Proficient", "Approaching", "Emerging", "Low Outlier", "Approaching", "Approaching", "Emerging")
sam <- cbind(x,y,z)
I need to convert the "High/Low Outliers" to 0, The NA's to be left as NA, "Novice" to 1, "Emerging" to 2, "Approaching to 3, and "Proficient" to 4.
I have tried to convert a single variable with
sam$x.r <- recode(sam$x.r,'Low Outlier'=0,'High Outlier'=0,'Novice'=1,'Emerging'=2,'Approaching'=3, 'Proficient'=4)
I received an error message of "Warning message:
In recode.numeric(Dat17_18.1$I.E.ScoreStat, Low Outlier
= 0, High Outlier
= 0, :
NAs introduced by coercion"
I am not sure how to recode all of the variables at once.
Got really repetitive real quick. Here's a simple function:
my_replacer<-function(df,y,z){
df<-as.data.frame(apply(df,2,function(x) gsub(y,z,x)))
#y is what you want to replace
#z is the replacement
#This uses regex
df
}
my_replacer(sam,"Emerging.*","2")
Here is how I've used it:
library(dplyr)#can use ifelse. Still repetitive
sam<-as.data.frame(sam)
sam %>%
mutate_if(is.factor,as.character)->sam
my_replacer(sam,"Emerging.*","2")
Result:
x y z
1 Low Outlier Novice High Outlier
2 High Outlier Approaching Proficient
3 Novice Proficient Approaching
4 Novice Approaching 2
5 2 High Outlier Low Outlier
6 <NA> Proficient Approaching
7 Proficient <NA> Approaching
8 Approaching 2 2
Replace others:
my_replacer(sam,"Novi.*","1")
x y z
1 Low Outlier 1 High Outlier
2 High Outlier Approaching Proficient
3 1 Proficient Approaching
4 1 Approaching Emerging
5 Emerging High Outlier Low Outlier
6 <NA> Proficient Approaching
7 Proficient <NA> Approaching
8 Approaching Emerging Emerging