Search code examples
rperformancedata.tabledummy-variablemodel.matrix

Speed up this loop to create dummy columns with data.table and set in R


I have a data table and I want to create a new column for each unique day, and then assign a 1 in each row where the day matches the column name

I have done this using a for loop but I was wondering if there was any way to optimise it using data.table and set?

Here is an example

dt <- data.table(Week_Day = c("Monday", "Tuesday", "Wednesday",
                          "Thursday", "Friday", "Saturday", "Sunday"))

Day <- unique(dt$Week_Day)
for (i in 1:length(Day)) {
    if (Day[i] != "Sunday") {
        dt[, Day[i] := ifelse(Week_Day == Day[i], 1, 0)]
    }
}

my table is 298k rows and although it doesn't take long to execute (below), its part of a long script and I have quite a few inefficient loops so I am trying to get the overall run time down.

Time to run:

user  system elapsed
0.99    0.06    1.05

Thanks in advance.


Solution

  • Here's a different approach that, performs better - on my machine - than the original approach in the question

    1) Get unique days except Sunday

    Day <- setdiff(dt$Week_Day, "Sunday")
    

    2) Initialize new columns with 0:

    dt[, (Day) := 0L]
    

    3) Update with 1s by reference in a loop:

    for(x in Day) {
      set(dt, i = which(dt[["Week_Day"]] == x), j = x, value = 1L)
    }
    

    Simple performance comparison:

    dt1 <- data.table(Week_Day = sample(c("Monday", "Tuesday", "Wednesday",
                                  "Thursday", "Friday", "Saturday", "Sunday"), 3e5, TRUE))
    
    dt2 <- copy(dt1)
    
    
    system.time({
      Day <- setdiff(unique(dt$Week_Day), "Sunday")
      dt1[, (Day) := 0L]
      for(x in Day) {
        set(dt1, i = which(dt1[["Week_Day"]] == x), j = x, value = 1L)
      }
    })
    #       User      System verstrichen 
    #      0.029       0.003       0.032 
    
    system.time({
      Day <- unique(dt$Week_Day)
      for (i in 1:length(Day)) {
        if (Day[i] != "Sunday") {
          dt2[, Day[i] := ifelse(Week_Day == Day[i], 1L, 0L)]
        }
      }
    })
    
    #       User      System verstrichen 
    #      0.138       0.070       0.210 
    
    
    all.equal(dt1, dt2)
    #[1] TRUE