Search code examples
rif-statementdplyrcumsum

cumsum NAs and other condition R


I've seen lots of questions like this but can't figure this simple problem out. I don't want to collapse the dataset. Say I have this dataset:

library(tidyverse)
library(lubridate)
df <- data.frame(group = c("a", "a", "a", "a", "a", "b", "b", "b"),
                 starts = c("2011-09-18", NA,  "2014-08-08", "2016-09-18", NA, "2013-08-08", "2015-08-08", NA),
                 ends = c(NA, "2013-03-06", "2015-08-08", NA, "2017-03-06", "2014-08-08", NA, "2016-08-08"))
df$starts <- parse_date_time(df$starts, "ymd")
df$ends <- parse_date_time(df$ends, "ymd")
df

  group     starts       ends
1     a 2011-09-18       <NA>
2     a       <NA> 2013-03-06
3     a 2014-08-08 2015-08-08
4     a 2016-09-18       <NA>
5     a       <NA> 2017-03-06
6     b 2013-08-08 2014-08-08
7     b 2015-08-08       <NA>
8     b       <NA> 2016-08-08

Desired output is:

  group     starts       ends epi
1     a 2011-09-18       <NA>   1
2     a       <NA> 2013-03-06   1
3     a 2014-08-08 2015-08-08   2
4     a 2016-09-18       <NA>   3
5     a       <NA> 2017-03-06   3
6     b 2013-08-08 2014-08-08   1
7     b 2015-08-08       <NA>   2
8     b       <NA> 2016-08-08   2

I was thinking something like this but obviously doesn't account for episodes where there is no NA

df <- df %>% 
  group_by(group) %>% 
  mutate(epi = cumsum(is.na(ends)))
df

I'm not sure how to incorporate cumsum(is.na) with condition if_else. Maybe I'm going at it the wrong way?

Any suggestions would be great.


Solution

  • A solution using dplyr. Assuming your data frame is well structured that each start always has an associated end record.

    df2 <- df %>%
      group_by(group) %>%
      mutate(epi = cumsum(!is.na(starts))) %>%
      ungroup()
    df2
    # # A tibble: 8 x 4
    #   group starts              ends                  epi
    #   <fct> <dttm>              <dttm>              <int>
    # 1 a     2011-09-18 00:00:00 NA                      1
    # 2 a     NA                  2013-03-06 00:00:00     1
    # 3 a     2014-08-08 00:00:00 2015-08-08 00:00:00     2
    # 4 a     2016-09-18 00:00:00 NA                      3
    # 5 a     NA                  2017-03-06 00:00:00     3
    # 6 b     2013-08-08 00:00:00 2014-08-08 00:00:00     1
    # 7 b     2015-08-08 00:00:00 NA                      2
    # 8 b     NA                  2016-08-08 00:00:00     2