r min max dates by id and multiple status changes within ID

I have an animal tracking dataset which is as shown below

 Id         Start       Stop          Status
 78122      10/12/1919  10/12/1919    Birth
 78122      1/18/1966   2/2/1972      In
 78122      2/3/1972    9/8/1972      In
 78122      9/9/1972    1/23/1974     In
 78122      1/24/1974   10/22/1975    Out
 78122      10/23/1975  5/4/1979      Out
 78122      5/5/1979    8/29/1980     Out
 78122      8/30/1980   5/14/1988     Out
 78122      5/15/1988   6/18/1988     In
 78122      6/19/1988   1/12/1989     In
 78122      1/13/1989   2/23/1990     In
 78122      2/24/1990   6/15/1991     Out
 78122      6/16/1991   2/11/1993     Out
 78122      2/12/1993   5/3/1994      Out
 78122      5/4/1994    7/27/1994     In
 78122      7/22/1994   1/25/1996     Out
 78122      1/26/1996   11/13/2001    In
 78122      11/14/2001  11/19/2001    In
 78122      11/20/2001  9/1/2009      In
 78122      9/26/2009   9/26/2009     Death

This animal was born in 1919 but moved in and out of its native territory multiple times. What I want to create is a dataset like this. I like to summarize the min(Start) and max(Stop) dates by Status.

For example: there are three rows indicating the animal was in the territory between 1/18/1966 to 1/23/1974.

 Id         Start       Stop          Status
 78122      1/18/1966   2/2/1972      In
 78122      2/3/1972    9/8/1972      In
 78122      9/9/1972    1/23/1974     In

This information should be summarized into 1 single row with min(Start) and max(Stop) like this

 Id         MinStart    MaxStop       Status
 78122      1/18/1966   1/23/1974     In

Again there are four rows indicating the animal was out of territory between 1/24/1974 to 5/14/1988.

 Id         Start       Stop          Status
 78122      1/24/1974   10/22/1975    Out
 78122      10/23/1975  5/4/1979      Out
 78122      5/5/1979    8/29/1980     Out
 78122      8/30/1980   5/14/1988     Out

This information should be summarized into 1 single row with min(Start) and max(Stop) like this

 Id         MinStart    MaxStop       Status
 78122      1/24/1974   5/14/1988     Out

Similarly for other In and Out status. The final dataset should look like this below.

 Id         MinStart    MaxStop       Status
 78122      10/12/1919  10/12/1919    Birth
 78122      1/18/1966   1/23/1974     In
 78122      1/24/1974   5/14/1988     Out
 78122      5/15/1988   2/23/1990     In
 78122      2/24/1990   5/3/1994      Out
 78122      5/4/1994    7/27/1994     In
 78122      7/28/1994   1/25/1996     Out
 78122      1/26/1996   9/1/2009      In
 78122      9/26/2009   9/26/2009     Death

Any suggestions on how to rearrange this dataset based on the criteria above is much apricated. So far I tried

 test1 <- testcase %>% 
          group_by(ID,Status) %>% 
          summarize(MinStart  = min(Start), MaxStop= max(Stop))

but this doesn't seem to work. It just creates one min and stop date for all the In Status and Out Status together. That is incorrect.

Solution

You need some run length encoding. I'm going to use data.table::rleid for convenience, but you can use the base version if you want:

library(data.table)
testcase %>% 
  group_by(Id, RLE = rleid(Status)) %>%
  arrange(Start) %>%
  dplyr::summarise(Start = min(Start), Stop = max(Stop), Status = first(Status))
# A tibble: 9 x 5
# Groups:   Id [1]
     Id   RLE Start      Stop       Status
  <int> <int> <date>     <date>     <chr> 
1 78122     1 1919-10-12 1919-10-12 Birth 
2 78122     2 1966-01-18 1974-01-23 In    
3 78122     3 1974-01-24 1988-05-14 Out   
4 78122     4 1988-05-15 1990-02-23 In    
5 78122     5 1990-02-24 1994-05-03 Out   
6 78122     6 1994-05-04 1994-07-27 In    
7 78122     7 1994-07-22 1996-01-25 Out   
8 78122     8 1996-01-26 2009-09-01 In    
9 78122     9 2009-09-26 2009-09-26 Death

Note that I converted your dates into class date, which I will leave to you. Otherwise they don't sort correctly.

And here is the group_by call without data.table

...
  group_by(Id, RLE = with(rle(Status), rep(seq_along(lengths), lengths))) %>%
...