I would like to add seasonality dummies in my R data.table
based on quarters. I have looked at multiple examples but I haven't been able to solve this issue yet. My knowledge about R
is limited so I was wondering if you could get me on the right track.
My data.table
looks like this:
Year_week artist_id number_of_events number_of_streams
1: 16/30 8296 1 957892
2: 16/33 8296 6 882282
3: 16/34 8296 5 926037
4: 16/35 8296 2 952704
5: 15/37 17879 1 89515
6: 16/22 22690 2 119653
What I would like to have is a format like this:
Year_week artist_id number_of_events number_of_streams Q2 Q3 Q4
1: 16/50 8296 1 957892 0 0 1
Two approaches:
1) Using dcast
, cut
and sub
:
dcast(DT[, Q := cut(as.integer(sub('.*/','',Year_week)),
breaks = c(0,13,26,39,53),
labels = paste0('Q',1:4))],
Year_week + artist_id + number_of_events + number_of_streams ~ Q,
value.var = 'Q',
drop = c(TRUE,FALSE),
fun = length)
gives:
Year_week artist_id number_of_events number_of_streams Q1 Q2 Q3 Q4
1: 15/37 17879 1 89515 0 0 1 0
2: 16/22 22690 2 119653 0 1 0 0
3: 16/30 8296 1 957892 0 0 1 0
4: 16/33 8296 6 882282 0 0 1 0
5: 16/34 8296 5 926037 0 0 1 0
6: 16/35 8296 2 952704 0 0 1 0
What this does:
as.integer(sub('.*/','',Year_week))
extacts the weeknumber from the Year_week
columncut
you divide that into quarters with the appropriate labels (see also ?cut
)dcast
you transpose the quarter column to wide format with a aggregation function (length
). By using drop = c(TRUE,FALSE)
in the dcast
function, you make sure that all quarters are included.Notes:
Q
-column is an ordered factor, so you can use that to arrange and filter your data as well.Q
variable.dcast
step).2) Using cut
, sub
and lapply
:
DT[, Q := cut(as.integer(sub('.*/','',Year_week)),
breaks = c(0,13,26,39,53),
labels = paste0('Q',1:4))
][, paste0('Q',1:4) := lapply(paste0('Q',1:4), function(q) as.integer(q == Q))][]
which gives a similar result. Instead of transposing with dcast
, you just check whether one of the quarter labels is in the Q
-column.
Used data:
DT <- fread(' Year_week artist_id number_of_events number_of_streams
16/30 8296 1 957892
16/33 8296 6 882282
16/34 8296 5 926037
16/35 8296 2 952704
15/37 17879 1 89515
16/22 22690 2 119653')