Search code examples
rlapplyseq

Using seq_along and lapply to process multiple dataframes (CAPM)


I have 48 dataframes and I wish to calculate a linear regression for each of the stocks in each of the dataframes (the CAPM). Each dataframe contains the same amount of stocks which is around 470, the S&P 500 and has 36 months worth of data. Originally I had one large dataframe but I have successfully managed to split the data into the 48 dataframes (this might not have been the smartest move but it is the way I solved the problem).

When I run the following code, it works fine. Noting that I have hard coded in Block 1.

  beta_results <- lapply(symbols, function(x) {
  temp <-  as.data.frame(Block1)
  input <- as.formula(paste("temp$",x, "~ temp$SP500" ))
  capm <- lm(input)
  coefficients(capm)
 })

Now rather than change the coding for each of the 48 blocks (ie Block1 to Block2 etc), I attempted the following, which in hindsight is complete rubbish. What I need is a way to increment the i from 1 to 48. I had tried to put all the dataframes in list, but given the way I have regression working I would be processing two lists and that was beyond me.

beta_results <- lapply(seq_along(symbols), function(i,x) {
 temp <-  as.data.frame(paste0("Block",i))
 input <- as.formula(paste("temp$",x, "~ temp$SP500" ))
 capm <- lm(input)
coefficients(capm)
})

Code for some example dataframes etc are:

 symbols <- c("A", "AAPL", "BRKB")

Block1 to BlockN would take the form of

             A      AAPL  BRKB    SP500
2016-04-29 -0.139  0.111  0.122    0.150 
2016-05-31  0.071  0.095  0.330    0.200 
2016-06-30 -0.042 -0.009  0.230    0.150
2016-07-29  0.090  0.060  0.200    0.100
2016-08-31  0.023  0.013  0.005    0.050  
2016-09-30  0.065  0.088  0.002    0.100

Solution

  • Consider a nested lapply where outer loop iterates through a list of dataframes and inner loop through each symbol. The result is a 48-member list, each containing 470 sets of beta coefficents.

    Also, as an aside, it is preferred to use lists of many similiarly structured objects especially to run same operations and avoid flooding your global environment (manage 1 list vs 48 dataframes):

    # LIST OF DATA FRAMES FROM ALL GLOBAL VARIABLES CONTAINING "Block"
    dfList <- mget(ls(pattern="Block"))
    
    # NESTED LAPPLY
    results_list <- lapply(dfList, function(df) {
    
      beta_results <- lapply(symbols, function(x) {
         input <- reformulate(quote(SP500), response=x)     
         capm <- lm(input, data=df)
         coefficients(capm)
      })
    
    })