Search code examples
rone-hot-encoding

R - How to one hot encoding a single column while keep other columns still?


I have a data frame like this:

group   student exam_passed subject 
A       01      Y           Math
A       01      N           Science
A       01      Y           Japanese
A       02      N           Math
A       02      Y           Science
B       01      Y           Japanese
C       02      N           Math

What I would like to achieve is the below result:

group   student exam_passed subject_Math  subject_Science  subject_Japanese   
A       01      Y           1             0                0
A       01      N           0             1                0
A       01      Y           0             0                1
A       02      N           1             0                0           
A       02      Y           0             1                0
B       01      Y           0             0                1
C       02      N           1             0                0

Here is the test data frame:

df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)

I have tried for loop, however, the original data is too large to deal with, and

mltools::one_hot(df, col = 'subject')

doesn't work either because of the this error:

Error in `[.data.frame`(dt, , cols, with = FALSE) :
unused argument (with = FALSE)

Could anyone help me with this? Thanks!


Solution

  • require(tidyr)
    require(dplyr)
    
    df %>% mutate(value = 1)  %>% spread(subject, value,  fill = 0 ) 
    
    
    group student exam_pass Japanese Math Science
    1     A      01         N        0    0       1
    2     A      01         Y        1    1       0
    3     A      02         N        0    1       0
    4     A      02         Y        0    0       1
    5     B      01         Y        1    0       0
    6     C      02         N        0    1       0