Search code examples
rlagdplyr

Create lags relative to whole change within group


I've tried creating a variable that represents the lagged version of another variable relative to the whole change of the variable within the group.

Let's use this example dataframe:

game_data <- data.frame(player = c(1,1,1,2,2,2,3,3,3), level = c(1,2,3,1,2,3,1,2,3), score=as.numeric(c(0,150,170,80,100,110,75,100,0)))
game_data
  player level score
1      1     1     0
2      1     2   150
3      1     3   170
4      2     1    80
5      2     2   100
6      2     3   110
7      3     1    75
8      3     2   100
9      3     3     0

I've tried the following, but while lagging the variable works, I am not able to create a new variable that shows the lag of the variable relative to the whole change for the player:

result <- 
+   game_data %>%
+   group_by(player) %>%
+   mutate(
+     lag_score = score - dplyr::lag(score, n=1, default = NA),
+     lag_score_relative = lag_score/sum(lag_score))

result
# A tibble: 9 x 5
# Groups:   player [3]
  player level score lag_score lag_score_relative
   <dbl> <dbl> <dbl>     <dbl>              <dbl>
1      1     1     0        NA                 NA
2      1     2   150       150                 NA
3      1     3   170        20                 NA
4      2     1    80        NA                 NA
5      2     2   100        20                 NA
6      2     3   110        10                 NA
7      3     1    75        NA                 NA
8      3     2   100        25                 NA
9      3     3     0      -100                 NA

For example, for player 1 it should be in Level 1: NA/170 = NA Level 2: 150/170 Level 3: 20/170

Thanks in advance, I hope anyone can help.


Solution

  • If you sum the lagged scores you include an NA. The sum then returns NA. You divide by NA which in the end returns NA for every value. To avoid this just set the na.rm argument to TRUE in your call of sum and NAs do not get included in the sum:

    game_data <- data.frame(player = c(1,1,1,2,2,2,3,3,3), level = c(1,2,3,1,2,3,1,2,3), 
      score=as.numeric(c(0,150,170,80,100,110,75,100,0)))
       game_data %>%
       group_by(player) %>%
     mutate(
       lag_score = score - dplyr::lag(score, n=1, default = NA),
       lag_score_relative = lag_score/sum(lag_score, na.rm = TRUE))