Search code examples
rdatemonthcalendardata-management

Number of months between two dates for each year in r


I'm working with data that looks like this:

name       start_date        end_date
 A         1993-06-25       1993-11-04
 B         2003-12-12       2004-07-20
 C         1997-06-11       2000-11-27
 D         1990-06-29       1992-07-02

I want to count up the number of months in each year that the names span.

So that the data looks like this:

name  year number_months
A     1993    5
B     2003    1
B     2004    7
C     1997    6
C     1998   12
C     1999   12
C     2000   11
D     1990    6
D     1991   12
D     1992    7

Here is a reproducible example:

name <- c("A", "B", "C", "D")
start_date <- as.Date(c("1993-06-25", "2003-12-12", "1997-06-11", "1990-06-29"))
end_date <- as.Date(c("1993-11-04", "2004-07-20", "2000-11-27", "1992-07-02"))

df <- data.frame(name, start_date, end_date)


Solution

  • An option in tidyverse

    library(dplyr)
    library(tidyr)
    library(purrr)
    library(tibble)
    library(lubridate)
    df %>% 
       transmute(name,  out = map2(start_date, end_date,
         ~ seq(.x, .y, by = 'months') %>% 
               year %>%
               table %>%
               enframe(name = 'year', value = 'number_months'))) %>% 
       unnest(c(out))
    

    Or another option with interval

    df %>% 
         transmute(name,  out = map2(start_date, end_date,
              ~ tibble(date = seq(.x, .y, by = 'months'), year = year(date)) %>%
                   group_by(year) %>%
                   summarise(number_months = interval(floor_date(first(date), 'month'), 
                       ceiling_date(last(date), 'month')) %/% months(1)) )) %>%
         unnest(c(out))
    

    Or in base R (extending @rawr's solution)

    do.call(rbind, Map(function(x, y, z) 
     cbind(name = z, stack(table(format(seq(x, y, by = 'months'), 
      '%Y')))), df$start_date, df$end_date, df$name))
    

    Or as @rawr commented, replacing the stack with data.frame can be used as well

    do.call(rbind, Map(function(x, y, z) 
         cbind(name = z, data.frame(table(format(seq(x, y, by = 'months'), 
           '%Y')))), df$start_date, df$end_date, df$name))