I would like to go from an irregular panel data structure to a regular panel data structure, but I am struggling how to do it. Any suggestions welcome!
Current irregular panel data structure:
trackingid <- as.character(c(1470149111625446735))
timestamp <- as.character(c("2018-06-17", "2018-06-18", "2018-06-19", "2018-06-21", "2018-06-22", "2018-06-23"))
pageimp <- as.numeric(c(8, 1, 3, 4, 2, 3))
dt <- data.frame(trackingid,timestamp, pageimp)
Aspired regular panel data structure:
trackingid <- as.character(c(1470149111625446735))
timestamp <- as.character(c("2018-06-17", "2018-06-18", "2018-06-19", "2018-06-20", "2018-06-21", "2018-06-22", "2018-06-23"))
pageimp <- as.numeric(c(8, 1, 3, 0, 4, 2, 3))
dt <- data.frame(trackingid,timestamp, pageimp)
Note in my full data, I will have many more trackingids with varying irregular time-stamps. All prior solutions so far discussed only moving from an irregular time series to a regular time series not considering the panel nature of my data.
Can do:
library(tidyverse)
dt %>%
mutate(timestamp = as.Date(timestamp)) %>%
group_by(trackingid) %>%
complete(timestamp = seq(min(timestamp), max(timestamp), by = "day"), fill = list(pageimp = 0))
Output:
# A tibble: 7 x 3
# Groups: trackingid [1]
trackingid timestamp pageimp
<fct> <date> <dbl>
1 1470149111625446656 2018-06-17 8
2 1470149111625446656 2018-06-18 1
3 1470149111625446656 2018-06-19 3
4 1470149111625446656 2018-06-20 0
5 1470149111625446656 2018-06-21 4
6 1470149111625446656 2018-06-22 2
7 1470149111625446656 2018-06-23 3
Basically you group by trackingid
, expand your data by day from the minimum to maximum timestamp
, and make use of fill
argument to populate anything missing with 0.