I have a string listing the terms in office of an individual, e.g.
all_terms <- "2012 to 2024, 2007 to 2007, 2001 to 2003, 2000 to 2009, 2010 to 2011"
I want to know whether the individual served in office continuously, which means:
The end year of one term and the start year of another term can be of a difference of one, i.e. a term ending in 2011 and a new term starting in 2012 will count as continuous.
Terms that fall within or across other terms should not affect continuity, i.e. the term of 2001 to 2003, above, falls within 2000 to 2009, and does not disturb continuity. Similarly, a term from 2008 to 2013 would not disturb continuity.
So the above example will be recognized as continuous, but this - "1989 to 2008, 2020 to 2024" would not be.
I have come up with this code, but it does not work:
all_terms <- "2012 to 2024, 2007 to 2007, 2001 to 2003, 2000 to 2009, 2010 to 2011"
# Process terms to extract years and create a data frame
terms_list <- str_split(all_terms, ",\\s*")[[1]]
years <- map(terms_list, ~str_extract_all(.x, "\\d{4}")[[1]])
years_df <- map_df(years, ~data.frame(start = as.numeric(.x[1]), end = as.numeric(.x[2])))
# Sort years by start date
years_df <- years_df %>% arrange(start)
# Adjust end year by adding one for continuity check
years_df$modified_end <- years_df$end + 1
# Check for continuity
is_continuous <- all(c(TRUE, tail(years_df$start, -1) <= head(years_df$modified_end, -1)))
# Results
list(
is_continuous = is_continuous,
start_years = min(years_df$start),
end_years = max(years_df$end)
)
We can use cummax
and cumsum
. I created a function that would count number of non-consecutive terms. For more details on these functions, refer to this previous answer of mine: Collapse and merge overlapping time intervals. *
one_term <- "2012 to 2024, 2007 to 2007, 2001 to 2003, 2000 to 2009, 2010 to 2011"
two_term <- "2013 to 2024, 2007 to 2007, 2001 to 2003, 2000 to 2009, 2010 to 2011"
four_term <- "2013 to 2024, 2007 to 2007, 2001 to 2003, 2000 to 2005, 2010 to 2011"
library(dplyr)
term_counter <- function(string_dat) {
as.data.frame(
do.call(rbind,
strsplit(strsplit(string_dat,
", ")[[1]],
" to "))) %>%
mutate(across(everything(), as.numeric)) %>%
arrange(V1, V2) %>%
mutate(terms = 1 + c(0, cumsum(lead(V1 - 1) >
cummax(V2))[-n()])) %>%
pull(terms) %>% max()
}
term_counter(one_term)
#> [1] 1
term_counter(two_term)
#> [1] 2
term_counter(four_term)
#> [1] 4
If you want to get the length of each term and maybe start and end of it, you can use the modified version below;
term_counter_mod <- function(string_dat) {
as.data.frame(
do.call(rbind,
strsplit(strsplit(string_dat,
", ")[[1]],
" to "))) %>%
mutate(across(everything(), as.numeric)) %>%
arrange(V1, V2) %>%
mutate(terms = 1 + c(0, cumsum(lead(V1 - 1) >
cummax(V2))[-n()])) %>%
summarise(from = min(V1), to = max(V2),
len = to - from + 1,
.by = terms)
}
lapply(setNames(list(one_term, two_term, four_term),
c("one", "two", "four")),
term_counter_mod)
#> $one
#> terms from to len
#> 1 1 2000 2024 25
#>
#> $two
#> terms from to len
#> 1 1 2000 2011 12
#> 2 2 2013 2024 12
#>
#> $four
#> terms from to len
#> 1 1 2000 2005 6
#> 2 2 2007 2007 1
#> 3 3 2010 2011 2
#> 4 4 2013 2024 12
Created on 2024-04-11 with reprex v2.0.2
* This is not a duplicate of that question.