Search code examples
rdatedplyrcountfrequency

R Count monthly frequency of occurrence between date range


x <- data.frame(ID = c(1,2,3,4),
            Line_name = c("AB", "CD", "AB", "CD"),
            start_dt = c("12/1/2020", "2/1/2021", "2/1/2021", "3/1/2021"),
            end_dt = c("4/1/2021", "4/1/2021", "3/1/2021", "4/1/2021"))

ID Line_name start_dt   end_dt
1        AB  12/1/2020 4/1/2021
2        CD  2/1/2021 4/1/2021
3        AB  2/1/2021 3/1/2021
4        CD  3/1/2021 4/1/2021

I have a dataframe that looks likes this. It has items that is used within date range (start date to end date). I need to count frequency of use of each item for every month. The resulting output would look something like this.

    Line_name Jan2021 Feb2021 Mar2021 Apr2021
1        AB       1       2       2       1
2        CD       0       1       2       2

In Jan, only AB was used. For that ID 1, the date ranges from Jan to April. So we would need to count that row for every month from Jan to April.

I am not sure how I can do it. I was thinking for instance January, I would check if 1/1/2021 date falls within start_dt and end_dt and if that condition is true than count. (date %within% interval(start_dt, end_dt))


Solution

  • An option is to get a sequence of dates by month between the 'start_dt', and 'end_dt' columns with map2 into a list, then unnest the list column, get the count and reshape back from 'long' to 'wide' with pivot_wider

    library(lubridate)
    library(dplyr)
    library(tidyr)
    x %>% 
       transmute(Line_name, Year_month = map2(mdy(start_dt), mdy(end_dt),
          ~ format(seq(.x, .y, by = '1 month'), '%b%Y'))) %>%
       unnest(c(Year_month)) %>%
       count(Line_name, 
            Year_month = factor(Year_month, levels = unique(Year_month))) %>%
        pivot_wider(names_from = Year_month, values_from = n, values_fill = 0)
    

    -output

    # A tibble: 2 x 5
      Line_name Jan2021 Feb2021 Mar2021 Apr2021
      <chr>       <int>   <int>   <int>   <int>
    1 AB              1       2       2       1
    2 CD              0       1       2       2