Search code examples
rtime-seriesdata-manipulationmissing-datadata-wrangling

Impute missing variable id's into a time series panel


In order to do some time series analysis, I want to use a dataframe that looks like this:

data <- data.frame (Store_ID = as.character(c(seq( 1, length.out = 10),
                                              seq( 1, length.out = 9),
                                              c(1,2,3,4,6,7,8,9))),
                    amount_sold = c(seq( 1, 9, length.out = 27)),
                    date = c(rep(as.Date("2015-01-01"),10),
                             rep(as.Date("2015-01-02"),9),
                             rep(as.Date("2015-01-03"),8)
                             )
                            )

As you can see, there are 10 Store_ID's for the first date (2015-01-01), but only 9 for the next date and 8 for the last date.

For my analysis I need to add the Store_ID's that are missing for the next two days. As a result I want to have 30 rows and a "0" as amount_sold for the missing Store_ID's.


Solution

  • Try

    library(tidyr)
    
    data <- data.frame (Store_ID = as.character(c(seq( 1, length.out = 10),
                                              seq( 1, length.out = 9),
                                              c(1,2,3,4,6,7,8,9))),
                    amount_sold = c(seq( 1, 9, length.out = 27)),
                    date = c(rep(as.Date("2015-01-01"),10),
                             rep(as.Date("2015-01-02"),9),
                             rep(as.Date("2015-01-03"),8)
                    )
    ) %>%
      complete(Store_ID, date, fill = list(amount_sold = 0))