I'm working with data that looks like this:
name start_date end_date
A 1993-06-25 1993-11-04
B 2003-12-12 2004-07-20
C 1997-06-11 2000-11-27
D 1990-06-29 1992-07-02
I want to count up the number of months in each year that the names span.
So that the data looks like this:
name year number_months
A 1993 5
B 2003 1
B 2004 7
C 1997 6
C 1998 12
C 1999 12
C 2000 11
D 1990 6
D 1991 12
D 1992 7
Here is a reproducible example:
name <- c("A", "B", "C", "D")
start_date <- as.Date(c("1993-06-25", "2003-12-12", "1997-06-11", "1990-06-29"))
end_date <- as.Date(c("1993-11-04", "2004-07-20", "2000-11-27", "1992-07-02"))
df <- data.frame(name, start_date, end_date)
An option in tidyverse
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(lubridate)
df %>%
transmute(name, out = map2(start_date, end_date,
~ seq(.x, .y, by = 'months') %>%
year %>%
table %>%
enframe(name = 'year', value = 'number_months'))) %>%
unnest(c(out))
Or another option with interval
df %>%
transmute(name, out = map2(start_date, end_date,
~ tibble(date = seq(.x, .y, by = 'months'), year = year(date)) %>%
group_by(year) %>%
summarise(number_months = interval(floor_date(first(date), 'month'),
ceiling_date(last(date), 'month')) %/% months(1)) )) %>%
unnest(c(out))
Or in base R
(extending @rawr's solution)
do.call(rbind, Map(function(x, y, z)
cbind(name = z, stack(table(format(seq(x, y, by = 'months'),
'%Y')))), df$start_date, df$end_date, df$name))
Or as @rawr commented, replacing the stack
with data.frame
can be used as well
do.call(rbind, Map(function(x, y, z)
cbind(name = z, data.frame(table(format(seq(x, y, by = 'months'),
'%Y')))), df$start_date, df$end_date, df$name))