Search code examples
rtime-serieslagdplyrlead

Sequencing R, Time Series Data


I am looking to add a new column to my current data frame which adds a new sequencing number based on a string of events in a football match.

This my current data frame

 head(test_P)
 index        team.name      possession_team.name  minute second period possession     type.name
1      5      Cardiff City         Cardiff City      0      0      1          2          Pass
2      6      Cardiff City         Cardiff City      0      2      1          2 Ball Receipt*
3      7      Cardiff City         Cardiff City      0      2      1          2         Carry
4      8      Cardiff City         Cardiff City      0      3      1          2          Pass
5      9      Cardiff City         Cardiff City      0      6      1          2 Ball Receipt*
6     10 Preston North End         Cardiff City      0      6      1          2          Duel
7     11 Preston North End         Cardiff City      0      6      1          2          Pass
8     12 Preston North End         Cardiff City      0      8      1          2 Miscontrol
9     13      Cardiff City         Cardiff City      0      8      1          2          Pass
10    14      Cardiff City         Cardiff City      0      9      1          2 Ball Receipt*
11    15      Cardiff City         Cardiff City      0      9      1          2         Cross
12    16 Preston North End         Cardiff City      0     10      1          2 Clearance
13    17      Cardiff City         Cardiff City      0     11      1          2          Pass
14    18      Cardiff City         Cardiff City      0     13      1          2 Ball Receipt*
15    19 Preston North End    Preston North End      0     13      1          3 Ball Recovery
16    20 Preston North End    Preston North End      0     13      1          3         Carry
17    21 Preston North End    Preston North End      0     21      1          3          Pass
18    22 Preston North End    Preston North End      0     22      1          3 Ball Receipt*.   

However, I want to add an additional column name after possession called sequence which labels the sequence number of a possession.

Every new possession should start with sequence with value of 1

But if the opposition breaks that sequence with an event/events and the possession value is still the same, the next time the possession team touches the ball it should be a new sequence number e.g 2 or if multiple breaks 3,4 etc

The opposition events should be coded with the same sequence number as the one to which they have broke

E.g Data Below

   index        team.name      possession_team.name  minute second period possession type.name sequence
1      5      Cardiff City         Cardiff City      0      0      1          2          Pass         1
2      6      Cardiff City         Cardiff City      0      2      1          2 Ball Receipt          1
3      7      Cardiff City         Cardiff City      0      2      1          2         Carry         1
4      8      Cardiff City         Cardiff City      0      3      1          2          Pass         1
5      9      Cardiff City         Cardiff City      0      6      1          2 Ball Receipt*         1
6     10 Preston North End         Cardiff City      0      6      1          2          Duel         1
7     11 Preston North End         Cardiff City      0      6      1          2          Pass         1
8     12 Preston North End         Cardiff City      0      8      1          2 Miscontrol            1
9     13      Cardiff City         Cardiff City      0      8      1          2          Pass         2
10    14      Cardiff City         Cardiff City      0      9      1          2 Ball Receipt          2
11    15      Cardiff City         Cardiff City      0      9      1          2         Cross         2
12    16 Preston North End         Cardiff City      0     10      1          2 Clearance             2
13    17      Cardiff City         Cardiff City      0     11      1          2          Pass         3
14    18      Cardiff City         Cardiff City      0     13      1          2 Ball Receipt          3
15    19 Preston North End    Preston North End      0     13      1          3 Ball Recovery         1
16    20 Preston North End    Preston North End      0     13      1          3         Carry         1
17    21 Preston North End    Preston North End      0     21      1          3          Pass         1
18    22 Preston North End    Preston North End      0     22      1          3 Ball Receipt          1

I have tried lead and lag functions combined with ifelse statements but can't seem to get the data to work

     test <- test  %>% mutate(P = ifelse(dplyr::lag(team.id)!=team.id & dplyr::lag(possession) == possession, dplyr::lag(seq_id) + 1,
                                                      ifelse(dplyr::lead(team.id)!=team.id & dplyr::lead(possession)!=possession , seq_id, 1))) 

Any help would be greatly appreciated and apologies for the untidiness of this question


Solution

  • The following feels very hacky, but might do the job.

    The logic is the following:

    • generate a flip variable that is 1/2 every time team.name "flips" and 0 otherwise.
    • generate cum_sum_flip, the cumulative sum over flip. Added 1 so that it starts at 1 and not 0.
    • generate sequence by taking the floor() from cum_sum_flip so that at every second flip, the sequence is increased.

    Notes:

    • I left the intermediate variables for easier understanding, you can consolidate it a bit.
    • Depending how your data is structured, you might have to group by match or something to make sure that when a whole new match starts it starts counting from 0 again.
    • This solution is not very robust and has some assumptions on the structure of the data. Please check for edge cases.
    library(dplyr)
    
    test_P %>% 
      mutate(flip = (lag(team.name) != team.name) %>% replace_na(0) * 1/2,
             .after = possession
      ) %>% group_by(possession) %>% 
      mutate(cum_sum_flip = cumsum(flip)+1, 
             sequence = floor(cum_sum_flip),
             .after = possession
      ) 
    

    Results:

    # A tibble: 18 x 11
    # Groups:   possession [2]
       index team.name         possession_team.name minute second period possession cum_sum_flip sequence  flip type.name    
       <dbl> <chr>             <chr>                 <dbl>  <dbl>  <dbl>      <dbl>        <dbl>    <dbl> <dbl> <chr>        
     1     5 Cardiff City      Cardiff City              0      0      1          2          1          1   0   Pass         
     2     6 Cardiff City      Cardiff City              0      2      1          2          1          1   0   Ball Receipt*
     3     7 Cardiff City      Cardiff City              0      2      1          2          1          1   0   Carry        
     4     8 Cardiff City      Cardiff City              0      3      1          2          1          1   0   Pass         
     5     9 Cardiff City      Cardiff City              0      6      1          2          1          1   0   Ball Receipt*
     6    10 Preston North End Cardiff City              0      6      1          2          1.5        1   0.5 Duel         
     7    11 Preston North End Cardiff City              0      6      1          2          1.5        1   0   Pass         
     8    12 Preston North End Cardiff City              0      8      1          2          1.5        1   0   Miscontrol   
     9    13 Cardiff City      Cardiff City              0      8      1          2          2          2   0.5 Pass         
    10    14 Cardiff City      Cardiff City              0      9      1          2          2          2   0   Ball Receipt*
    11    15 Cardiff City      Cardiff City              0      9      1          2          2          2   0   Cross        
    12    16 Preston North End Cardiff City              0     10      1          2          2.5        2   0.5 Clearance    
    13    17 Cardiff City      Cardiff City              0     11      1          2          3          3   0.5 Pass         
    14    18 Cardiff City      Cardiff City              0     13      1          2          3          3   0   Ball Receipt*
    15    19 Preston North End Preston North End         0     13      1          3          1.5        1   0.5 Ball Recovery
    16    20 Preston North End Preston North End         0     13      1          3          1.5        1   0   Carry        
    17    21 Preston North End Preston North End         0     21      1          3          1.5        1   0   Pass         
    18    22 Preston North End Preston North End         0     22      1          3          1.5        1   0   Ball Receipt*
    

    data

    test_P <- tribble(
    ~index, ~team.name, ~possession_team.name, ~minute, ~second, ~period, ~possession, ~type.name, 
    5 ,      "Cardiff City",  "Cardiff City",       0,        0,       1,           2,  "Pass",
    6 ,      "Cardiff City",  "Cardiff City",       0,        2,       1,           2,  "Ball Receipt*",
    7 ,      "Cardiff City",  "Cardiff City",       0,        2,       1,           2,  "Carry",
    8 ,      "Cardiff City",  "Cardiff City",       0,        3,       1,           2,  "Pass",
    9 ,      "Cardiff City",  "Cardiff City",       0,        6,       1,           2,  "Ball Receipt*",
    10,  "Preston North End", "Cardiff City",       0,        6,       1,           2,  "Duel",
    11,  "Preston North End", "Cardiff City",       0,        6,       1,           2,  "Pass",
    12,  "Preston North End", "Cardiff City",       0,        8,       1,           2,  "Miscontrol",
    13,       "Cardiff City", "Cardiff City",       0,        8,       1,           2,  "Pass",
    14,       "Cardiff City", "Cardiff City",       0,        9,       1,           2,  "Ball Receipt*",
    15,       "Cardiff City", "Cardiff City",       0,        9,       1,           2,  "Cross",
    16,  "Preston North End", "Cardiff City",       0,       10,       1,           2,  "Clearance",
    17,       "Cardiff City", "Cardiff City",       0,       11,       1,           2,  "Pass",
    18,       "Cardiff City", "Cardiff City",       0,       13,       1,           2,  "Ball Receipt*",
    19,  "Preston North End", "Preston North End",  0,       13,       1,           3,  "Ball Recovery",
    20,  "Preston North End", "Preston North End",  0,       13,       1,           3,  "Carry",
    21,  "Preston North End", "Preston North End",  0,       21,       1,           3,  "Pass",
    22,  "Preston North End", "Preston North End",  0,       22,       1,           3,  "Ball Receipt*")