To explain - I have a dataset in chronological order of game results. Each row shows the team name, the opponent, the date, and if they won or not. I want to group on two levels (both the team and the opponent) to see how many games in a row one team beat another. That I am able to do. What I want to add to this is to also retain the first date of that streak and the last date.
Here's some sample code for you to use:
library(tidyverse)
test <- data.frame(date = c(1:10),
team = c(rep(c("red", "blue"),5)),
opponent = c("black", "white", "black", "white", "white",
"black", "white", "white", "black", "white"),
result = c(1,1,1,0,0,1,0,0,1,1))
test %>%
group_by(team, opponent) %>%
mutate(consec_wins = ifelse(result == 0, 0, sequence(rle(result)$lengths))) %>%
summarise(consec_wins = max(consec_wins))
output
# A tibble: 4 × 3
# Groups: team [2]
team opponent consec_wins
<chr> <chr> <dbl>
1 blue black 1
2 blue white 1
3 red black 3
4 red white 0
This code was able to identify that team red beat team black three times in a row but doesn't state the start/end of that streak. I've tried adding a first() and last() function in summarize but noticed it does it on the group level (team and opponent) and not just the range of the winning streak.
I hope this is enough for you to work with. Much appreciated!
Let me know if this works. I'm utilising data.table::rleid()
for identifying records of unique streaks.
library(dplyr)
library(data.table)
test <- data.frame(date = c(1:10),
team = c(rep(c("red", "blue"),5)),
opponent = c("black", "white", "black", "white", "white",
"black", "white", "white", "black", "white"),
result = c(1,1,1,0,0,1,0,0,1,1))
output <- test %>%
group_by(team, opponent) %>%
mutate(consec_wins = ifelse(result == 0, 0, sequence(rle(result)$lengths))) %>%
mutate(win_id = if_else(result == 0, 0, data.table::rleid(result))) %>%
group_by(team, opponent, win_id) %>%
mutate(min = min(date),
max = max(date)) %>%
group_by(team, opponent) %>%
arrange(desc(consec_wins), desc(result)) %>%
slice(1) %>%
select(team, opponent, consec_wins, min, max)
output
#> # A tibble: 4 x 5
#> # Groups: team, opponent [4]
#> team opponent consec_wins min max
#> <chr> <chr> <dbl> <int> <int>
#> 1 blue black 1 6 6
#> 2 blue white 1 2 2
#> 3 red black 3 1 9
#> 4 red white 0 5 7
Created on 2023-04-07 with reprex v2.0.2