Search code examples
rdplyrdummy-variable

Mutating dummy variables in dplyr


I want to create 7 dummy variables -one for each day, using dplyr

So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df

#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
           day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))

#1.Create the 7 dummy variables separately
daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label")

#2. append to dataframe
mydf<-bind_cols(mydf,daysdummy)


> mydf
  x   day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
1 a   Mon       0       1       0       0         0        0       0
2 b  Tues       0       0       0       0         0        1       0
3 c   Wed       0       0       0       0         0        0       1
4 d Thurs       0       0       0       0         1        0       0
5 e   Fri       1       0       0       0         0        0       0
6 f   Sat       0       0       1       0         0        0       0
7 g   Sun       0       0       0       1         0        0       0
8 h   Fri       1       0       0       0         0        0       0
9 i   Mon       0       1       0       0         0        0       0

My question is whether I can do it in one single workflow using dplyr and add the to_dummy into the pipe-workflow- perhaps using mutate?

*to_dummy documentation


Solution

  • If you want to do this with the pipe, you can do something like:

    library(dplyr)
    library(sjmisc)
    
    mydf %>% 
      to_dummy(day, suffix = "label") %>% 
      bind_cols(mydf) %>% 
      select(x, day, everything())
    

    Returns:

    # A tibble: 9 x 9
      x     day   day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
      <fct> <fct>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>    <dbl>   <dbl>
    1 a     Mon        0.      1.      0.      0.        0.       0.      0.
    2 b     Tues       0.      0.      0.      0.        0.       1.      0.
    3 c     Wed        0.      0.      0.      0.        0.       0.      1.
    4 d     Thurs      0.      0.      0.      0.        1.       0.      0.
    5 e     Fri        1.      0.      0.      0.        0.       0.      0.
    6 f     Sat        0.      0.      1.      0.        0.       0.      0.
    7 g     Sun        0.      0.      0.      1.        0.       0.      0.
    8 h     Fri        1.      0.      0.      0.        0.       0.      0.
    9 i     Mon        0.      1.      0.      0.        0.       0.      0.
    

    With dplyr and tidyr we could do:

    library(dplyr)
    library(tidyr)
    
    mydf %>% 
      mutate(var = 1) %>% 
      spread(day, var, fill = 0, sep = "_") %>% 
      left_join(mydf) %>% 
      select(x, day, everything())
    

    And with base R we could do something like:

    as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
    

    Returns:

      Fri Mon Sat Sun Thurs Tues Wed
    a   0   1   0   0     0    0   0
    b   0   0   0   0     0    1   0
    c   0   0   0   0     0    0   1
    d   0   0   0   0     1    0   0
    e   1   0   0   0     0    0   0
    f   0   0   1   0     0    0   0
    g   0   0   0   1     0    0   0
    h   1   0   0   0     0    0   0
    i   0   1   0   0     0    0   0