Search code examples
rgroup-bydplyrlag

Identify value change (categorical) by group. R


Below you can see a sample of my dataset.

Fac Date        Type        Change  StartDT     EndDT
AAA 1/1/2019    General     0   1/1/2019    1/2/2019
AAA 1/2/2019    General     0   1/1/2019    1/2/2019
AAA 1/3/2019    Special     1   1/3/2019    1/4/2019
AAA 1/4/2019    Special     1   1/3/2019    1/4/2019
AAA 1/5/2019    Intensive   2   1/5/2019    1/5/2019
BBB 1/1/2019    General     0   1/1/2019    1/4/2019
BBB 1/2/2019    General     0   1/1/2019    1/4/2019
BBB 1/3/2019    General     0   1/1/2019    1/4/2019
BBB 1/4/2019    General     0   1/1/2019    1/4/2019
BBB 1/5/2019    Reserve     1   1/5/2019    1/6/2019
BBB 1/6/2019    Reserve     1   1/5/2019    1/6/2019

I would like create a variable to track changes in my Type variable (change). I used to work in Stata and the logic to do this is to first track if the value change compared to the previous record (0/1) for each panel/group and then have a running sum of this value.

bysort Facility (Date): gen byte era = sum(Type != Type[_n-1] & _n > 1) 

How can I do this in R? Also after change variable is created I will need to generate the start and end (min, max) dates for each Fac and Change ("era").

I would appreciate any help! Thanks in advance! Marvin


Solution

  • Here is one solution, using dplyr:

    dat =
      tibble(
        fac = c(rep("A", 10), rep("B", 10)),
        type = sample(1:3, 20, replace = TRUE)
      )
    
    dat %>% 
      group_by(fac) %>% 
      mutate(
        change = case_when(
          type != lag(type) ~ TRUE,
          TRUE ~ FALSE
        ),
        n_change = cumsum(change)
      )
    

    For your code, you could then add:

    group_by(Fac, n_change) %>%
    mutate(
      min_start_date = min(StartDT),
      max_start_date = max(EndDT)
    )