I don't think the title clearly explain the data calculation I need to do, so I have created a simple reproducible example below:
Here is the input dataframe
structure(list(homePoints = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 2, 2, 4, 4, 4, 4, 4, 4, 5), awayPoints = c(0,
0, 0, 0, 0, 1, 2, 2, 2, 2, 5, 5, 8, 8, 8, 10, 10, 10), homeMargin = c(0,
0, 0, 0, 0, -1, -2, -2, -2, 0, -3, -1, -4, -4, -4, -6, -6, -5
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-18L))
# A tibble: 18 x 3
homePoints awayPoints homeMargin
<dbl> <dbl> <dbl>
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 1 -1
7 0 2 -2
8 0 2 -2
9 0 2 -2
10 2 2 0
11 2 5 -3
12 4 5 -1
13 4 8 -4
14 4 8 -4
15 4 8 -4
16 4 10 -6
17 4 10 -6
18 5 10 -5
And I'm simply attempting to get the following output of "home streaks":
c(-2, 2, -3, 2, -5, 1)
Output explained - The away team starts the game by scoring 2 points (separately 1 point at a time over rows 6 and 7) -2
, then the home team scores 2 points (row 10) 2
, then the away team scores 3 points (row 11) -3
, then the home team scores 2 points (row 12) 2
, then the away team scores 5 points (rows 13, 16) -5
, then the home team scores 1 point (row 18), 1
. Positives for home streaks, negatives for away streaks.
Preferably these streaks would be created as a separate column onto the dataframe called streaks
or something like that, and the dataframe would be filtered (in this instance from 18 to 6 rows), with no real preference as to which rows are lost.
Edit: a particular challenge is that I cannot simply look for changes in the homeMargin column, as a single change in the margin doesn't equal a streak. Rather, streaks involve a continuous change in the margin in the same direction.
Edit 2: my efforts are something along these lines so far:
my_data %>%
dplyr::mutate(streakDirection = c(0, diff(zoo::as.zoo(homeMargin), na.pad = F))) %>%
dplyr::mutate(signChange = c(0, diff(sign(streakDirection))))
However, this doesn't really get me to where I need to go.
Here's one approach:
points %>%
mutate(change_net = homeMargin - lag(homeMargin, default = 0),
direction = sign(change_net)) %>%
filter(direction != 0) %>%
mutate(streak = cumsum(direction != lag(direction, default = 0))) %>%
count(streak, wt = change_net)
# A tibble: 6 x 2
streak n
<int> <dbl>
1 1 -2
2 2 2
3 3 -3
4 4 2
5 5 -5
6 6 1