Search code examples
rdataframefillquandl

R: How to fill yearly data within monthly data?


I'm trying to load data from Quandl with collapse = "monthly". Some of the values are only available in a yearly or halve-yearly fashion. Some other values are only available within certain periods of time. This leaves me with a lot of inhomogeneous data. How can I fill the yearly and halve-yearly data in a "Last observation carried forward" fashion and the remaining NAs with 0?

Here is my idea of the data I got and the one I want to have at the end:

library(tibble)

set.seed(4711)

# How do I get from:
#
df.start <- data_frame(
  Date = seq.Date(as.Date("1990-01-01"), as.Date("1999-12-01"), "1 month"),
  B = rep(NA, 120),
  C = c(rep(NA, 50), rnorm(120 - 50)),
  D = rep(c(rnorm(1), rep(NA, 11)), 10),
  E = c(rep(NA, 24), rep(c(rnorm(1), rep(NA, 11)), 8)),
  F = c(rep(NA, 45), rnorm(50), rep(NA, 25)),
  G = c(rep(NA, 24), rep(c(rnorm(1), rep(NA, 11)), 6), rep(NA, 24)),
  H = c(rep(NA, 10), rnorm(20), rep(NA, 16), rnorm(37), rep(NA, 37)),
  I = rep(c(rnorm(1), rep(NA, 5)), 20)
)
#
# To:
#
df.end <- data_frame(
  Date = seq.Date(as.Date("1990-01-01"), as.Date("1999-12-01"), "1 month"),
  B = rep(0, 120),
  C = c(rep(0, 50), rnorm(120 - 50)),
  D = rep(rnorm(10), each = 12),
  E = c(rep(0, 24), rep(rnorm(8), each = 12)),
  F = c(rep(0, 45), rnorm(50), rep(0, 25)),
  G = c(rep(0, 24), rep(rnorm(6), each = 12), rep(0, 24)),
  H = c(rep(0, 10), rnorm(20), rep(0, 16), rnorm(37), rep(0, 37)),
  I = rep(rnorm(20), each = 6)
)
#
# Automatically?
#

Solution

  • You can use fill to fill the NAs with the last non-empty value (except for the Date column), and then replace the remaining NAs by 0. We do these operations grouped by year.

    library(tidyverse)
    library(lubridate)
    
    df.end <- df.start %>%
      mutate(year = year(Date)) %>%
      group_by(year) %>%
      fill(., colnames(df.start[-1])) %>%
      replace(., is.na(.), 0) %>%
      ungroup() %>%
      select(-year)