Search code examples
rone-hot-encoding

One-hot encoding in R- creating dataframe column names from variables in a loop


I am using a dataframe called "rawData" which has a column called "Season" with values ranging from 1 to 4. I am trying to use a loop to perform one-hot-encoding, i.e create 4 new columns called "Season 1" , "Season 2", "Season 3", "Season 4", where each column has a binary indicator value of 1/0 showing whether the Season in the column name is occurring for each data-point. So far I have tried this:

for (i in 1:4){
text<-paste("Season", toString(i), sep = " ")
if (rawData$season==i) {
rawData$text<-1
}
}

However, I am just getting an additional column in my dataframe called "text" with all values =1. I understand why R is doing this, but I can not figure out an alternative way to make it do what I want. I tried changing the if-then statement to change "rawData$text" to "rawData$paste("Season", toString(i), sep = " ")<-1" but that is giving me an error


Solution

  • df <- data.frame(
      group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
      student = c('01', '01', '01', '02', '02', '01', '02'),
      exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
      subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
    )
    
    library(dummy)
    library(dummies)
    
    df1 <- dummy.data.frame(df, names=c("subject"), sep="_") 
    

    This reproducible sample code will help you to do one hot encoding without using for loop.

    Example provided by you also works for the same

    df1 <- data.frame(seasons = c(1,3,2,4,3,4,1,1,1))
    
    library(dummy)
    library(dummies)
    
    df2 <- dummy.data.frame(df1, names=c("seasons"), sep="_")