Search code examples
rlistdataframelapplylag

How to lag and calculate difference for every data frame in a list?


I have a list containing 981 data frames. Each data.frame has the same structure.

I want to lag one column (called growth) to calculate the growth over time (from one observation to another) for each data frame.

I tried lapply by somehow could not get it done.

my_list <- 
  list(
    data.frame(time = 1:10, growth = rnorm(10, mean = 1.3, sd = 2)),
    data.frame(time = 1:10, growth = rnorm(10, mean = 1.3, sd = 2)),
    data.frame(time = 1:10, growth = rnorm(10, mean = 1.3, sd = 2))
  )

Solution

  • If you are not able to share real data you can create a fake dataset to make the post reproducible.

    If I have understood you correctly here is what you can do with lapply

    lapply(list_df, function(x) {x$difference <- c(NA, diff(x$growth)); x})
    
    #[[1]]
    #   growth b difference
    #1       3 a         NA
    #2       8 b          5
    #3       4 c         -4
    #4       7 d          3
    #5       6 e         -1
    #6       1 f         -5
    #7      10 g          9
    #8       9 h         -1
    #9       2 i         -7
    #10      5 j          3
    
    #[[2]]
    #   growth b difference
    #1      10 a         NA
    #2       5 b         -5
    #3       6 c          1
    #4       9 d          3
    #5       1 e         -8
    #6       7 f          6
    #7       8 g          1
    #8       4 h         -4
    #9       3 i         -1
    #10      2 j         -1
    

    The tidyverse way to do the same would be

    library(dplyr)
    library(purrr)
    
    map(list_df,. %>% mutate(difference = c(NA, diff(growth))))
    

    OR

    map(list_df,. %>% mutate(difference = growth - lag(growth)))
    

    data

    set.seed(123)
    list_df <- list(data.frame(growth = sample(10), b = letters[1:10]), 
                   data.frame(growth = sample(10), b = letters[1:10]))