Search code examples
rdatelaglubridatedplyr

Using lag in mutate() on a arranged data


I am working on a data set which is similar to

data <-tribble(
  ~id, ~ dates, ~days_prior,
  1,20190101, NA,
  1,NA, 15,
  1,NA, 20,
  2, 20190103, NA,
  2,NA, 3,
  2,NA, 4)

I have the first date for each ID and I am trying to calculate the next date by adding days_prior to the previous date. I am using the lag function to refer to the previous date.

 df<- df%>% mutate(dates = as.Date(ymd(dates)), days_prior =as.integer(days_prior))


 df<-df %>% mutate(dates =
   as.Date(ifelse(is.na(days_prior),dates,days_prior+lag(dates)),
   origin="1970-01-01"))

This works but only for the next row as you can see attached data.

enter image description here

What am I doing wrong? I would like all the dates to be calculated by mutate(). What different approach should I take to calculate this.


Solution

  • I don't really see how lag would help here; unless I misunderstood here is an option using tidyr::fill

    data %>%
        group_by(id) %>%
        mutate(dates = as.Date(ymd(dates))) %>%
        fill(dates) %>%
        mutate(dates = dates + if_else(is.na(days_prior), 0L, as.integer(days_prior))) %>%
        ungroup()
    ## A tibble: 6 x 3
    #     id dates      days_prior
    #  <dbl> <date>          <dbl>
    #1     1 2019-01-01         NA
    #2     1 2019-01-16         15
    #3     1 2019-01-21         20
    #4     2 2019-01-03         NA
    #5     2 2019-01-06          3
    #6     2 2019-01-07          4
    

    Or a slight variation, replacing the NA entries in days_prior with 0

    data %>%
        group_by(id) %>%
        mutate(
            dates = as.Date(ymd(dates)),
            days_prior = replace(days_prior, is.na(days_prior), 0)) %>%
        fill(dates) %>%
        mutate(dates = dates + as.integer(days_prior)) %>%
        ungroup()
    

    Update

    In response to your clarifications in the comments, here is what you can do

    data %>%
        group_by(id) %>%
        mutate(
            dates = as.Date(ymd(dates)),
            days_prior = replace(days_prior, is.na(days_prior), 0)) %>%
        fill(dates) %>%
        mutate(dates = dates + cumsum(days_prior)) %>%
        ungroup()
    ## A tibble: 6 x 3
    #     id dates      days_prior
    #  <dbl> <date>          <dbl>
    #1     1 2019-01-01          0
    #2     1 2019-01-16         15
    #3     1 2019-02-05         20
    #4     2 2019-01-03          0
    #5     2 2019-01-06          3
    #6     2 2019-01-10          4