Search code examples
rdplyrrle

Retain First and last date from a grouping in DPLYR after using RLE to find consecutive instances


To explain - I have a dataset in chronological order of game results. Each row shows the team name, the opponent, the date, and if they won or not. I want to group on two levels (both the team and the opponent) to see how many games in a row one team beat another. That I am able to do. What I want to add to this is to also retain the first date of that streak and the last date.

Here's some sample code for you to use:

library(tidyverse)
test <- data.frame(date = c(1:10),
               team = c(rep(c("red", "blue"),5)),
               opponent = c("black", "white", "black", "white", "white",
                            "black", "white", "white", "black", "white"),
               result = c(1,1,1,0,0,1,0,0,1,1))

test %>% 
  group_by(team, opponent) %>% 
  mutate(consec_wins = ifelse(result == 0, 0, sequence(rle(result)$lengths))) %>% 
  summarise(consec_wins = max(consec_wins))

output

# A tibble: 4 × 3
# Groups:   team [2]
  team  opponent consec_wins
  <chr> <chr>          <dbl>
1 blue  black              1
2 blue  white              1
3 red   black              3
4 red   white              0

This code was able to identify that team red beat team black three times in a row but doesn't state the start/end of that streak. I've tried adding a first() and last() function in summarize but noticed it does it on the group level (team and opponent) and not just the range of the winning streak.

I hope this is enough for you to work with. Much appreciated!


Solution

  • Let me know if this works. I'm utilising data.table::rleid() for identifying records of unique streaks.

    library(dplyr)
    library(data.table)
    
    test <- data.frame(date = c(1:10),
                       team = c(rep(c("red", "blue"),5)),
                       opponent = c("black", "white", "black", "white", "white",
                                    "black", "white", "white", "black", "white"),
                       result = c(1,1,1,0,0,1,0,0,1,1))
    output <- test %>%
      group_by(team, opponent) %>%
      mutate(consec_wins = ifelse(result == 0, 0, sequence(rle(result)$lengths))) %>%
      mutate(win_id = if_else(result == 0, 0, data.table::rleid(result))) %>%
      group_by(team, opponent, win_id) %>%
      mutate(min = min(date),
             max = max(date)) %>%
      group_by(team, opponent) %>%
      arrange(desc(consec_wins), desc(result)) %>%
      slice(1) %>%
      select(team, opponent, consec_wins, min, max)
    output
    #> # A tibble: 4 x 5
    #> # Groups:   team, opponent [4]
    #>   team  opponent consec_wins   min   max
    #>   <chr> <chr>          <dbl> <int> <int>
    #> 1 blue  black              1     6     6
    #> 2 blue  white              1     2     2
    #> 3 red   black              3     1     9
    #> 4 red   white              0     5     7
    

    Created on 2023-04-07 with reprex v2.0.2