Search code examples
rfor-loopnested-loops

Find the previous-trial score within a nested dataframe in a loop and save as a new variable


I have a dataframe that currently looks like this:

subjectID Trial Score
1 1 16
1 1 16
1 1 16
1 2 8
1 2 8
1 2 8
1 3 12
1 3 12
1 3 12
2 1 9
2 1 9
2 1 9
2 2 10
2 2 10
2 2 10

I need to create a new column, Previous_Trial_Score, that is simply the score on the last trial for each person. For example:

subjectID Trial Score Previous_Trial_Score
1 1 16 NA
1 1 16 NA
1 1 16 NA
1 2 8 16
1 2 8 16
1 2 8 16
1 3 12 8
1 3 12 8
1 3 12 8
2 1 9 NA
2 1 9 NA
2 1 9 NA
2 2 10 9
2 2 10 9
2 2 10 9

And so on. Trial 1 for each subject will always be NA, as there is no previous trial for that person. I am writing a for-loop to accomplish this, below:

for (myperson in unique(data$subjectID)){
  for (mytrial in unique(data$Trial[data$Trial>1])){

    #Specify the trial and person
    Prev_Score=as.numeric(unique(data[data$subjectID==myperson & data$Trial==mytrial-1, "Score"]))
    

    #Save it to the dataframe
    data[data$subjectID==myperson & data$Trial==mytrial,"Prev_Score"]=Prev_Score
    
    
  }
}

In the above loop, I had to specify as.numeric and unique to get R to return a single value properly. However, when I run the loop, I get this error:

Error: Assigned data `value` must be compatible with existing data.
i Error occurred for column `Prev_Score`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 3...

Does anyone have a solution? I am open to tidyverse/dplyr work-arounds.


Solution

  • We could do a group by summarise to get the lag and then do a join

    library(dplyr)
    df1 %>% 
       group_by(subjectID, Trial) %>%
       summarise(Previous_Trial_Score = last(Score), .groups= 'drop_last') %>% 
       mutate(Previous_Trial_Score = lag(Previous_Trial_Score)) %>%   
       left_join(df1, .)
    

    -output

       subjectID Trial Score Previous_Trial_Score
    1          1     1    16                   NA
    2          1     1    16                   NA
    3          1     1    16                   NA
    4          1     2     8                   16
    5          1     2     8                   16
    6          1     2     8                   16
    7          1     3    12                    8
    8          1     3    12                    8
    9          1     3    12                    8
    10         2     1     9                   NA
    11         2     1     9                   NA
    12         2     1     9                   NA
    13         2     2    10                    9
    14         2     2    10                    9
    15         2     2    10                    9
    

    Or slightly compact option with data.table

    library(data.table)
    setDT(df1)[, Previous_Trial_Score := shift(.SD[, last(Score), 
           Trial]$V1)[Trial], subjectID]
    

    -output

    > df1
        subjectID Trial Score Previous_Trial_Score
            <int> <int> <int>                <int>
     1:         1     1    16                   NA
     2:         1     1    16                   NA
     3:         1     1    16                   NA
     4:         1     2     8                   16
     5:         1     2     8                   16
     6:         1     2     8                   16
     7:         1     3    12                    8
     8:         1     3    12                    8
     9:         1     3    12                    8
    10:         2     1     9                   NA
    11:         2     1     9                   NA
    12:         2     1     9                   NA
    13:         2     2    10                    9
    14:         2     2    10                    9
    15:         2     2    10                    9
    

    data

    df1 <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c(1L, 1L, 1L, 2L, 2L, 2L, 
    3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L), Score = c(16L, 16L, 16L, 
    8L, 8L, 8L, 12L, 12L, 12L, 9L, 9L, 9L, 10L, 10L, 10L)), 
    class = "data.frame", row.names = c(NA, 
    -15L))