Search code examples
rdataframedata-analysismissing-datadata-cleaning

How do I add missing rows (filled with 0's) in a data frame using some column as reference?


I have this (big) data frame and I want to add the missing rows (filled with 0's) in relation to the column named "date".

For example:

date    0        1         2          3         4         5
199807  0       546.98     19355.55   1110.6    4720.34   3435.9
199808  0       1668.85    4398.29    5816.71   599.7     535.28
199809  608.79  446.68     2297.68    1862.64   415.6     933.46
199811  0       0          1366.59    1337.01   0         711.59
199812  0       0          601.8      2079.9    2030.24   368.44
199901  0       1462.8     0          552.78    0         0
199903  0       621.2      2242       753.36    0         0

And I would like this:

date    0        1        2            3         4          5
199807  0        546.98   19355.55     1110.6    4720.34    3435.9
199808  0        1668.85  4398.29      5816.71   599.7      535.28
199809  608.79   446.68   2297.68      1862.64   415.6      933.46
199810  0        0        0            0         0          0
199811  0        0        1366.59      1337.01   0          711.59
199812  0        0        601.8        2079.9    2030.24    368.44
199901  0        1462.8   0            552.78    0          0
199902  0        0        0            0         0          0
199903  0        621.2    2242         753.36    0          0

Thank you very much


Solution

  • It can be done as follows:

    1) Create a placeholder dataset with all dates and zeros in all other columns

    2) for dates, which are in the original dataset, write values from original to placeholder dataset

    # Create a placeholder dataset, including all dates, 0 at all other columns
    df <- data.frame(date = c(199801:199812, 199901:199912),
                     `1` = 0, `2` = 0, `3` = 0, check.names = FALSE)
    
    # create the dataset with some data (this will be the 'original' dataset)
    df2 <- data.frame(date = c(199807:199809, 199811:199812, 199901:199903),
                      `1` = rnorm(8), `2` = rnorm(8), `3` = rnorm(8), check.names = FALSE)
    
    # write data from original dataset to placeholder dataset
    df[df$date %in% df2$date,] <- df2