Search code examples
rgroup

How do I check a state-year group contains a proportion lesser or equal to 0.60?


I have the following data frame:

structure(list(state = c("CA", "CA", "CA", "CA", "CA", "CA", 
"WA", "WA", "WA", "PA", "PA"), prop = c(0.4, 0.5, 0.3, NA, 0.61, 
0.3, 0.76, 0.4, 0.35, 0.21, NA), year = c(2000, 2000, 2001, 2001, 
2001, 2002, 2002, 2002, 2004, 2000, 2005)), class = "data.frame", row.names = c(NA, 
-11L))

I want to check if the previous state-year group in the dataset contains a prop lesser or equal than 0.6.

So, my desired output is:

structure(list(state = c("CA", "CA", "CA", "CA", "CA", "CA", 
"WA", "WA", "WA", "PA", "PA"), prop = c(0.4, 0.5, 0.3, NA, 0.61, 
0.3, 0.76, 0.4, 0.35, 0.21, NA), year = c(2000, 2000, 2001, 2001, 
2001, 2002, 2002, 2002, 2004, 2000, 2005), competitive = c(NA, 
NA, 1, 1, 1, 0, NA, NA, 0, NA, 1)), class = "data.frame", row.names = c(NA, 
-11L))

As you can tell, since there is no data for CA-1999, the competitive column says NA. And, for CA-2001, since all the prop in CA-2000 are less than 0.6, they are all coded as competitive==1. And, for CA-2002, since CA-2001 contains one observation greater than 0.6, competitiveness==1. And same for PA. PA-2000 is the state-year preceding PA-2005 in the dataset, so the competitive value for PA-2005 is based on PA-2000.

I've tried many different methods and keep getting odd results, such as:

df <- df %>%
  arrange(state, year) %>%
  group_by(state) %>%
  mutate(
    prev = lag(prop <= 0.60),
    competitive = ifelse(is.na(prev), NA, prev),
    competitive = ifelse(any(prop >= 0.60), 0, 1)
    )

This is only taking the previous entry, and doesn't consider the entire group.


Solution

  • Here's one method

    
    input %>% 
      summarize(is_comp = all(prop<.6), .by=c(state, year)) %>% 
      mutate(is_prev_comp = lag(is_comp)+0, .by=c(state)) %>% 
      select(-is_comp) %>% 
      right_join(input)
    
    

    which returns

    #    state year is_prev_comp prop
    # 1     CA 2000           NA 0.40
    # 2     CA 2000           NA 0.50
    # 3     CA 2001            1 0.30
    # 4     CA 2001            1   NA
    # 5     CA 2001            1 0.61
    # 6     CA 2002            0 0.30
    # 7     WA 2002           NA 0.76
    # 8     WA 2002           NA 0.40
    # 9     WA 2004            0 0.35
    # 10    PA 2000           NA 0.21
    # 11    PA 2005            1   NA