Search code examples
rdplyrgroupingsummarize

Spread function in R with multiple fields which constitute the key


Im working to transform my table in this way so i can join it on another table. Here is a sample of what my initial table looks like:

df1 <- data.frame(ID = c(1,1,1,2,2,2,3,3,3),
                  date=c('2021-11-01', '2021-11-01', '2021-11-02','2021-11-01',
                         '2021-11-01', '2021-11-02','2021-11-01', '2021-11-01', '2021-11-02'
                         ),
                  event_name = c('a', 'a', 'c', 'a', 'b', 'b','a', 'b', 'c'),
                  Time_duration = c(1, 3, 5, 9, 2, 4, 1, 6, 8))

And this is an example of how I'd like the data to look after spreading it:

df2 <- data.frame(ID = c(1,1, 2,2, 3,3),
                  date=c('2021-11-01', '2021-11-02',
                         '2021-11-01', '2021-11-02',
                         '2021-11-01', '2021-11-02'
                         ),
                  event_A_duration_sum = c(4, 0,
                                       9, 0,
                                       1,0),
                  event_B_duration = c(0, 0,
                                       2, 4,
                                       6,0),
                  event_C_duration = c(0, 5,
                                       0, 0,
                                       0,8))

In my final table, I need to have the data grouped by (ID, Date); a unique identified for my final table would be both the ID and date. Each ID and date can have multiple events of the same type, for instance.

I hope this makes sense. Should I concatenate my identifiers and then split them back, after? Or is there a better way to do this using DPLYR?

Cheers, appreciate any help.


Solution

  • library(data.table)
    
    dcast(data.table(df1), 
    ID + date ~ event_name,value.var = 'Time_duration',
     fun.aggregate = sum)
    
    Key: <ID, date>
          ID       date     a     b     c
       <num>     <char> <num> <num> <num>
    1:     1 2021-11-01     4     0     0
    2:     1 2021-11-02     0     0     5
    3:     2 2021-11-01     9     2     0
    4:     2 2021-11-02     0     4     0
    5:     3 2021-11-01     1     6     0
    6:     3 2021-11-02     0     0     8