I'm trying to use dplyr::complete
and fill
to fill in gaps in a time sequence of animal weights (weighed roughly weekly most of the time), BUT I only want to do it within certain limits.
In the following example dataset, several dates are missing: a single weighing on 1/29/2020 and a series of 4 missing weeks in March/April. We are okay with missing 1 week of weighing (such as on 1/29) and are fine "filling" down the original weight for two weeks, but don't want to go any further than that. The second set of missing data should only be filled for 13 more days, and then the rest of the gap should be NA for the wt_g.
library(tidyverse)
library(lubridate)
animalwts <- tibble::tribble(
~Animal, ~WtDate, ~Wt_g,
"A", "1/1/2020", 20L,
"A", "1/8/2020", 21L,
"A", "1/15/2020", 21L,
"A", "1/22/2020", 23L,
"A", "2/5/2020", 25L,
"A", "2/12/2020", 23L,
"A", "2/19/2020", 24L,
"A", "2/26/2020", 23L,
"A", "3/4/2020", 22L,
"A", "4/8/2020", 24L
) %>%
mutate(WtDate = mdy(WtDate))
The following code works to complete the date series and fill in all the missing data
animalwts %>%
group_by(Animal) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g)
But I'm trying to figure out how to complete
all dates, but only fill
in weights for two weeks at most from any given date, and put NAs for any further missing data.
I'd like to stay "in the pipe" if possible.
Like this?
library(tidyverse)
library(lubridate)
animalwts %>%
group_by(Animal) %>%
mutate(NA_lag = WtDate - lag(WtDate),
last_measurement_date = WtDate) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g) %>%
fill(last_measurement_date) %>%
group_by(last_measurement_date, NA_lag) %>%
mutate(days_missing = row_number()) %>%
mutate(Wt_g = if_else(days_missing > 14, NA_integer_, Wt_g))
Data
animalwts <- tibble::tribble(
~Animal, ~WtDate, ~Wt_g,
"A", "1/1/2020", 20L,
"A", "1/8/2020", 21L,
"A", "1/15/2020", 21L,
"A", "1/22/2020", 23L,
"A", "2/5/2020", 25L,
"A", "2/12/2020", 23L,
"A", "2/19/2020", 24L,
"A", "2/26/2020", 23L,
"A", "3/4/2020", 22L,
"A", "4/8/2020", 24L
) %>%
mutate(WtDate = mdy(WtDate))