Search code examples
rdataframedatedummy-variable

Creating new data.frame considering a dummy variable for period and a third variable


Using this data (*corrected)

structure(list(Date = structure(c(1461, 1826, 2191, 2557, 2922, 
3287, 3652, 4018, 4383, 4748, 5113, 5479, 5844, 6209, 6574, 6940, 
7305, 7670, 8035, 8401, 8766, 9131, 9496, 9862, 10227, 10592, 
10957, 11323, 11688, 12053, 12418, 12784, 13149, 13514, 13879, 
14245, 14610, 14975, 15340, 15706, 16071, 16436, 16801, 17167, 
17532, 17897), class = "Date"), State = c(1, 1, 1, 1, 1, 0, 0, 
1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), Duration = c("", 
"", "", "", "5", "", "2", "1", "", "", "3", "", "2", "", "2", 
"1", "1", "1", "", "", "3", "", "", "", "4", "", "", "", "", 
"", "", "", "", "", "10", "1", "", "2", "", "", "", "", "", "", 
"", "8")), row.names = c(NA, -46L), class = "data.frame")

I would like to reach this new data.frame:

newdf <- data.frame(State = c("Expansion", "Contraction", "Expansion", "Contraction", "Expansion", "Contraction",
                             "Expansion", "Contraction", "Expansion", "Contraction", "Expansion", "Contraction",
                             "Expansion", "Contraction", "Expansion"),
                   Duration = c("5", "2", "1", "3", "2", "2", "1", "1", "1", "3", "4", "10", "1", "2", "8"),
                   Period = c("1974 - 1978", "1979 - 1980", "1981", "1982 - 1984", "1985 - 1986", "1987 - 1988", "1989",
                              "1990", "1991", "1992 - 1994", "1995 - 1998", "1999 - 2008", "2009", "2010 - 2011", "2012 - 2019"))

Dummy equal to one would return "Expansion" (zero, "Contraction"). I guess the column Duration could be used to get the initial date and final date.


Solution

  • here is an option with tidyverse. Convert the 'Date' to Date class, extract the year from the 'Date', create a grouping column based on the occurrence of numeric values in 'Duration', after grouping by 'grp', summarise by taking the last value of 'Duration', while pasteing the first and last of 'year' if there are more than one row in 'grp' or else return the first 'year' and similarly, change the 'State' to a single value 'Expansion/Contraction' if all the 'State' are 1 or 0

    library(dplyr)
    library(lubridate)
    library(stringr)
    df %>% 
        mutate(Date = as.Date(Date), 
               year = year(Date),
           grp = lag(cumsum(str_detect(Duration, "\\d+")), default = 0)) %>%
        group_by(grp) %>%
        summarise(Duration = last(Duration), 
                  Period =if(n() > 1) str_c(first(year), last(year), sep=' - ') else
              as.character(first(year)), 
           State = if(all(as.logical(State))) 'Expansion' else 'Contraction' ) %>% 
        select(State, Duration, Period)
    # A tibble: 15 x 3
    #   State       Duration Period     
    #   <chr>       <chr>    <chr>      
    # 1 Expansion   5        1974 - 1978
    # 2 Contraction 2        1979 - 1980
    # 3 Expansion   1        1981       
    # 4 Contraction 3        1982 - 1984
    # 5 Expansion   2        1985 - 1986
    # 6 Contraction 2        1987 - 1988
    # 7 Expansion   1        1989       
    # 8 Contraction 1        1990       
    # 9 Expansion   1        1991       
    #10 Contraction 3        1992 - 1994
    #11 Expansion   4        1995 - 1998
    #12 Contraction 10       1999 - 2008
    #13 Expansion   1        2009       
    #14 Contraction 2        2010 - 2011
    #15 Expansion   8        2012 - 2019