Search code examples
rdplyrtime-seriesfill

dplyr::complete/fill a time sequence, but only for limited stretches of time


I'm trying to use dplyr::complete and fill to fill in gaps in a time sequence of animal weights (weighed roughly weekly most of the time), BUT I only want to do it within certain limits.

In the following example dataset, several dates are missing: a single weighing on 1/29/2020 and a series of 4 missing weeks in March/April. We are okay with missing 1 week of weighing (such as on 1/29) and are fine "filling" down the original weight for two weeks, but don't want to go any further than that. The second set of missing data should only be filled for 13 more days, and then the rest of the gap should be NA for the wt_g.

library(tidyverse)
library(lubridate)

animalwts <- tibble::tribble(
      ~Animal,     ~WtDate, ~Wt_g,
      "A",  "1/1/2020",   20L,
      "A",  "1/8/2020",   21L,
      "A", "1/15/2020",   21L,
      "A", "1/22/2020",   23L,
      "A",  "2/5/2020",   25L,
      "A", "2/12/2020",   23L,
      "A", "2/19/2020",   24L,
      "A", "2/26/2020",   23L,
      "A",  "3/4/2020",   22L,
      "A",  "4/8/2020",   24L
    ) %>%
        mutate(WtDate = mdy(WtDate))

The following code works to complete the date series and fill in all the missing data

animalwts %>%
  group_by(Animal) %>%
  complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
  fill(Wt_g) 

But I'm trying to figure out how to complete all dates, but only fill in weights for two weeks at most from any given date, and put NAs for any further missing data.

I'd like to stay "in the pipe" if possible.


Solution

  • Like this?

    library(tidyverse)
    library(lubridate)
    
    animalwts %>%
      group_by(Animal) %>%
      mutate(NA_lag = WtDate - lag(WtDate),
             last_measurement_date = WtDate) %>% 
      complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
      fill(Wt_g) %>% 
      fill(last_measurement_date) %>% 
      group_by(last_measurement_date, NA_lag) %>% 
      mutate(days_missing = row_number()) %>% 
      mutate(Wt_g = if_else(days_missing > 14, NA_integer_, Wt_g))
    

    Data

    animalwts <- tibble::tribble(
      ~Animal,     ~WtDate, ~Wt_g,
      "A",  "1/1/2020",   20L,
      "A",  "1/8/2020",   21L,
      "A", "1/15/2020",   21L,
      "A", "1/22/2020",   23L,
      "A",  "2/5/2020",   25L,
      "A", "2/12/2020",   23L,
      "A", "2/19/2020",   24L,
      "A", "2/26/2020",   23L,
      "A",  "3/4/2020",   22L,
      "A",  "4/8/2020",   24L
    ) %>%
      mutate(WtDate = mdy(WtDate))