Calculating number of days using criterion

Overview

For the sake of a hypothetical: imagine that I'm an avid sportsman. I have a dataset that logs every time I go boating/skiing/whatever. I want to calculate the number of days that have passed since the last hypothetical accident that occurred in each given locale.

My data

Here is a small reproducible set of my data:

mydata <- data.frame(state = c(rep("Vermont", 5), rep("New Hampshire", 5)),
                     date = c("2016-01-01", "2016-01-03", "2016-01-04", "2016-01-04", "2016-02-01",
                              "2016-01-03", "2016-01-15", "2016-01-16", "2016-02-01", "2016-02-03"),
                     accident = c(1, 0, 0, 1, 1,
                                  0, 1, 1, 0, 1))

Here's what it looks like. Notice that the dates are "ragged" - I sometimes ski/boat two days in a row, and other times I take a week-long break. Also notice that I'm doing these sports activities in multiple states, and I want to group_by the state, resulting in an NA value for the first time I ski/boat in that area.

          state       date accident
        Vermont 2016-01-01        1
        Vermont 2016-01-02        0
        Vermont 2016-01-03        0
        Vermont 2016-01-04        1
        Vermont 2016-02-01        1
  New Hampshire 2016-01-03        0
  New Hampshire 2016-01-15        1
  New Hampshire 2016-01-16        1
  New Hampshire 2016-02-01        0
  New Hampshire 2016-02-03        1

I want to produce this:

          state       date accident   numdays
        Vermont 2016-01-01        1        NA
        Vermont 2016-01-02        0         1
        Vermont 2016-01-03        0         2
        Vermont 2016-01-04        1         3
        Vermont 2016-02-01        1        28
  New Hampshire 2016-01-03        0        NA
  New Hampshire 2016-01-15        1        NA
  New Hampshire 2016-01-16        1         1
  New Hampshire 2016-02-01        0        16
  New Hampshire 2016-02-03        1        18

*Edited to correct typo in data

Solution

Here is one option :

Create a new column which has dates when an accident occurs. Use tidyr::fill to fill the days when accident does not occur with the previous value. For each state calculate number of days passed since the last accident occur.

library(dplyr)

mydata %>%
  mutate(date = as.Date(date), 
         numdays = replace(date, accident == 0, NA)) %>%
  group_by(state) %>%
  mutate(numdays = lag(numdays)) %>%
  tidyr::fill(numdays) %>%
  mutate(numdays = as.integer(date - numdays))

#   state         date       accident numdays
#   <chr>         <date>        <dbl>   <int>
# 1 Vermont       2016-01-01        1      NA
# 2 Vermont       2016-01-02        0       1
# 3 Vermont       2016-01-03        0       2
# 4 Vermont       2016-01-04        1       3
# 5 Vermont       2016-02-01        1      28
# 6 New Hampshire 2016-01-03        0      NA
# 7 New Hampshire 2016-01-15        1      NA
# 8 New Hampshire 2016-01-16        1       1
# 9 New Hampshire 2016-02-01        0      16
#10 New Hampshire 2016-02-03        1      18

data

There was probably a typo in few of date entries which I have corrected below.

mydata <- data.frame(state = c(rep("Vermont", 5), rep("New Hampshire", 5)),
                     date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04", "2016-02-01",
                              "2016-01-03", "2016-01-15", "2016-01-16", "2016-02-01", "2016-02-03"),
                     accident = c(1, 0, 0, 1, 1,
                                  0, 1, 1, 0, 1))