Search code examples
rnumericlevels

re-express Categorical Field values using R


I have a dataset with a column called education. The education column has several names. I want to replace those names with numerical number. Once I am done with that, I go to see the new column in the dataset which gives me NA.

Here is my attempt:

library(plyr)                 #Load plyr package 

edu.num <- revalue(x = bank_train$education,replace = 
                     c("illiterate" = 0,
                       "basic.4y" = 4,
                       "basic.6y" = 6,
                       "basic.9y" = 9,
                       "high.school" = 12,
                       "professional.course" = 12,
                       "university.degree" = 16,
                       "unknown" = NA))
bank_train$education_numeric <- as.numeric(levels(edu.num))[edu.num]


enter image description here


Solution

  • revalue function doesn't returns a factor object, but a character vector. So levels(edu.num) returns "NULL", since levels function is adapted to factors.

    So you should just modify this last line of the code

    library(plyr)#Load plyr package 
    
    edu.num <- revalue(x = bank_train$education,replace = 
                     c("illiterate" = 0,
                       "basic.4y" = 4,
                       "basic.6y" = 6,
                       "basic.9y" = 9,
                       "high.school" = 12,
                       "professional.course" = 12,
                       "university.degree" = 16,
                       "unknown" = NA))
    bank_train$education_numeric <- as.numeric(edu.num)