Search code examples
rtime-seriespanel

How to generate a regular panel from an irregular panel time series data


I would like to go from an irregular panel data structure to a regular panel data structure, but I am struggling how to do it. Any suggestions welcome!

Current irregular panel data structure:

trackingid <- as.character(c(1470149111625446735))
timestamp <- as.character(c("2018-06-17", "2018-06-18", "2018-06-19", "2018-06-21", "2018-06-22", "2018-06-23"))
pageimp <- as.numeric(c(8, 1, 3, 4, 2, 3))
dt <- data.frame(trackingid,timestamp, pageimp)

Aspired regular panel data structure:

trackingid <- as.character(c(1470149111625446735))
timestamp <- as.character(c("2018-06-17", "2018-06-18", "2018-06-19", "2018-06-20", "2018-06-21", "2018-06-22", "2018-06-23"))
pageimp <- as.numeric(c(8, 1, 3, 0, 4, 2, 3))
dt <- data.frame(trackingid,timestamp, pageimp)

Note in my full data, I will have many more trackingids with varying irregular time-stamps. All prior solutions so far discussed only moving from an irregular time series to a regular time series not considering the panel nature of my data.


Solution

  • Can do:

    library(tidyverse)
    
    dt %>%
      mutate(timestamp = as.Date(timestamp)) %>%
      group_by(trackingid) %>%
      complete(timestamp = seq(min(timestamp), max(timestamp), by = "day"), fill = list(pageimp = 0))
    

    Output:

    # A tibble: 7 x 3
    # Groups:   trackingid [1]
      trackingid          timestamp  pageimp
      <fct>               <date>       <dbl>
    1 1470149111625446656 2018-06-17       8
    2 1470149111625446656 2018-06-18       1
    3 1470149111625446656 2018-06-19       3
    4 1470149111625446656 2018-06-20       0
    5 1470149111625446656 2018-06-21       4
    6 1470149111625446656 2018-06-22       2
    7 1470149111625446656 2018-06-23       3
    

    Basically you group by trackingid, expand your data by day from the minimum to maximum timestamp, and make use of fill argument to populate anything missing with 0.