Search code examples
rlubridatedcast

Monthly Dummy Variables in dataset


I have a dataset with 10 columns. One of those columns is the date. I want to create dummy variables for every month. How do I go about doing this?

      Date     Col1     Col2  
2017-01-09        v        2
2017-05-01        s        7
2018-03-02        k        9

I can extract the month using lubridate:

df$MONTH<-month(df$Date)

      Date     Col1     Col2     MONTH
2017-01-09        v        2         1
2017-05-01        s        7         5
2018-03-02        k        9         3

How do I transform this to have the dummy variables for each month cbinded to the original?

      Date     Col1     Col2    M1   M2   M3   M4   M5   M6   M7   M8   M9   M10    M11   M12
2017-01-09        v        2     1    0    0    0    0    0    0    0    0   0        0     0
2017-05-01        s        7     0    0    0    0    1    0    0    0    0   0        0     0
2018-03-02        k        9     0    0    1    0    0    0    0    0    0   0        0     0

Solution

  • One option is tabulate on ther 'MONTH' and create the columns

    df[paste0("M", 1:12)] <- as.data.frame(t(sapply(df$MONTH, tabulate, 12)))
    

    Or use row/column indexing where the column index is taken from the 'MONTH' and assign those values from a matrix of 0's to 1

    m1 <- matrix(0, nrow(df), 12)
    m1[cbind(seq_len(nrow(df)), df$MONTH)] <- 1
    df[paste0("M", 1:12)] <- m1
    df
    #        Date Col1 Col2 MONTH M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
    #1 2017-01-09    v    2     1  1  0  0  0  0  0  0  0  0   0   0   0
    #2 2017-05-01    s    7     5  0  0  0  0  1  0  0  0  0   0   0   0
    #3 2018-03-02    k    9     3  0  0  1  0  0  0  0  0  0   0   0   0