Search code examples
rdataframedplyrfiltering

Filter previous data in grouped dataframe using dplyr


Given the data.frame below, how can I obtain a genre history (by player) for each game deemed interesting?

player <- c(rep (1,3), rep(2,4))
game <- c(seq(1:3), seq(1:4))
genre <- c("JnR", "Ego", "RPG", "RPG", "Sports", "JnR", "Sim")
interesting <- c("no", rep(c("no","yes"),3))
                 
playerhist <- data.frame (player, game, genre, interesting)

> playerhist

  player game  genre interesting
1      1    1    JnR          no
2      1    2    Ego          no
3      1    3    RPG         yes
4      2    1    RPG          no
5      2    2 Sports         yes
6      2    3    JnR          no
7      2    4    Sim         yes

Desired output:

  player game genre_history
1      1    1           JnR
2      1    2           Ego
3      2    1           RPG
4      2    1           RPG
5      2    2        Sports
6      2    3           JnR

So for each interesting game I'd like to include all the previous rows for the corresponding player. Including information from the 'interesting' column is not required as shown in the desired output, but if solutions include it anyway, that would also be fine. This column would read c('no','no','no','no','yes','no').

A solution using dplyr would be preferred.


Solution

  • I finally found the solution. Here it is for those trying to solve a similar problem:

    library(dplyr)
    
    #-------------------------------
    # #1 number of interesting games for each player
    # #2 splitting df into list of dfs for each player
    # #3 copy & append df for -> max(.$int_game) times 
    # #4 numbering of dfs using a new number every time game == 1
    # #5 splitting df into list of dfs for each player & df_num
    # #6 count occurrences of interesting == 'yes'
    # #7 set cutpoint to slice determined by cut1 == df_num for the first time
    # #8 slice from row 1:row at which cut2 == 1 and omit the last row
    #-------------------------------
    
    gamehist <- playerhist %>%
      group_by (player) %>%
      mutate (int_game = cumsum(str_detect(interesting, "yes"))) %>% #1
      ungroup () %>%
      base::split (., .$player, drop = FALSE) %>% #2 
      lapply (., function (df) { 
        df %>% 
          .[rep(1:nrow(.), max(.$int_game)),] %>% #3 
          mutate (df_num = cumsum(.$game == 1)) %>% #4 
          return ()
        }) %>% 
      bind_rows () %>%
      base::split (., list(.$player, .$df_num), drop = TRUE) %>% #5
      lapply (., function (df2) {
        df2 %>% 
          mutate (cut1 = cumsum (str_detect(interesting, "yes")), #6
                  cut2 = if_else (cut1 == df_num, 1, 0)) %>% #7
          slice (1:which(cut2 == 1)-1) %>% #8
          return ()
        }) %>% 
      bind_rows () %>%
      select (player, game, genre)
    
    gamehist
    
    # A tibble: 6 × 3
      player  game genre 
       <dbl> <int> <chr> 
    1      1     1 JnR   
    2      1     2 Ego   
    3      2     1 RPG   
    4      2     1 RPG   
    5      2     2 Sports
    6      2     3 JnR