Search code examples
rpanel

Create balanced data set


I am using R and have a long data set as the one outlined below:

Date           ID     Status
2014-10-01     12      1
2015-04-01     12      1
2015-07-01     12      1
2015-09-01     12      1
2015-11-01     12      0
2016-01-01     12      0
2016-05-01     12      0
2016-08-01     12      1
2017-03-01     12      1
2017-05-01     12      1
2014-10-01     13      1
2015-04-01     13      1
2015-07-01     13      0
2015-11-01     14      0
2016-01-01     14      0
...

My goal is to create a "balanced" data i.e. each ID should occur for each of the 10 dates. The variable "Status" for the initially non-occurring observations should be labeled as N/A. In other words, the outcome should look like this:

Date           ID     Status
2014-10-01     12      1
2015-04-01     12      1
2015-07-01     12      1
2015-09-01     12      1
2015-11-01     12      0
2016-01-01     12      0
2016-05-01     12      0
2016-08-01     12      1
2017-03-01     12      1
2017-05-01     12      1
2014-10-01     13      1
2015-04-01     13      1
2015-07-01     13      N/A
2015-09-01     13      N/A
2015-11-01     13      N/A
2016-01-01     13      N/A
2016-05-01     13      N/A
2016-08-01     13      N/A
2017-03-01     13      N/A
2017-05-01     13      N/A
2014-10-01     14      N/A
2015-04-01     14      N/A
2015-07-01     14      N/A
2015-09-01     14      N/A
2015-11-01     14      0
2016-01-01     14      0
2016-05-01     14      N/A
2016-08-01     14      N/A
2017-03-01     14      N/A
2017-05-01     14      N/A
...

Thank you for your help!


Solution

  • The following worked for me:

    df_b <- data.frame(date = rep(unique(df$date), length(unique(df$id))),
                   id = rep(unique(df$id), each = length(unique(df$date))))
    
    balanced_data <- left_join(df_b, df)