Using this data (*corrected)
structure(list(Date = structure(c(1461, 1826, 2191, 2557, 2922,
3287, 3652, 4018, 4383, 4748, 5113, 5479, 5844, 6209, 6574, 6940,
7305, 7670, 8035, 8401, 8766, 9131, 9496, 9862, 10227, 10592,
10957, 11323, 11688, 12053, 12418, 12784, 13149, 13514, 13879,
14245, 14610, 14975, 15340, 15706, 16071, 16436, 16801, 17167,
17532, 17897), class = "Date"), State = c(1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), Duration = c("",
"", "", "", "5", "", "2", "1", "", "", "3", "", "2", "", "2",
"1", "1", "1", "", "", "3", "", "", "", "4", "", "", "", "",
"", "", "", "", "", "10", "1", "", "2", "", "", "", "", "", "",
"", "8")), row.names = c(NA, -46L), class = "data.frame")
I would like to reach this new data.frame:
newdf <- data.frame(State = c("Expansion", "Contraction", "Expansion", "Contraction", "Expansion", "Contraction",
"Expansion", "Contraction", "Expansion", "Contraction", "Expansion", "Contraction",
"Expansion", "Contraction", "Expansion"),
Duration = c("5", "2", "1", "3", "2", "2", "1", "1", "1", "3", "4", "10", "1", "2", "8"),
Period = c("1974 - 1978", "1979 - 1980", "1981", "1982 - 1984", "1985 - 1986", "1987 - 1988", "1989",
"1990", "1991", "1992 - 1994", "1995 - 1998", "1999 - 2008", "2009", "2010 - 2011", "2012 - 2019"))
Dummy equal to one would return "Expansion" (zero, "Contraction"). I guess the column Duration could be used to get the initial date and final date.
here is an option with tidyverse
. Convert the 'Date' to Date
class, extract the year
from the 'Date', create a grouping column based on the occurrence of numeric values in 'Duration', after grouping by 'grp', summarise
by taking the last
value of 'Duration', while paste
ing the first
and last
of 'year' if there are more than one row in 'grp' or else return the first
'year' and similarly, change the 'State' to a single value 'Expansion/Contraction' if all
the 'State' are 1 or 0
library(dplyr)
library(lubridate)
library(stringr)
df %>%
mutate(Date = as.Date(Date),
year = year(Date),
grp = lag(cumsum(str_detect(Duration, "\\d+")), default = 0)) %>%
group_by(grp) %>%
summarise(Duration = last(Duration),
Period =if(n() > 1) str_c(first(year), last(year), sep=' - ') else
as.character(first(year)),
State = if(all(as.logical(State))) 'Expansion' else 'Contraction' ) %>%
select(State, Duration, Period)
# A tibble: 15 x 3
# State Duration Period
# <chr> <chr> <chr>
# 1 Expansion 5 1974 - 1978
# 2 Contraction 2 1979 - 1980
# 3 Expansion 1 1981
# 4 Contraction 3 1982 - 1984
# 5 Expansion 2 1985 - 1986
# 6 Contraction 2 1987 - 1988
# 7 Expansion 1 1989
# 8 Contraction 1 1990
# 9 Expansion 1 1991
#10 Contraction 3 1992 - 1994
#11 Expansion 4 1995 - 1998
#12 Contraction 10 1999 - 2008
#13 Expansion 1 2009
#14 Contraction 2 2010 - 2011
#15 Expansion 8 2012 - 2019