I have 48 dataframes and I wish to calculate a linear regression for each of the stocks in each of the dataframes (the CAPM). Each dataframe contains the same amount of stocks which is around 470, the S&P 500 and has 36 months worth of data. Originally I had one large dataframe but I have successfully managed to split the data into the 48 dataframes (this might not have been the smartest move but it is the way I solved the problem).
When I run the following code, it works fine. Noting that I have hard coded in Block 1.
beta_results <- lapply(symbols, function(x) {
temp <- as.data.frame(Block1)
input <- as.formula(paste("temp$",x, "~ temp$SP500" ))
capm <- lm(input)
coefficients(capm)
})
Now rather than change the coding for each of the 48 blocks (ie Block1 to Block2 etc), I attempted the following, which in hindsight is complete rubbish. What I need is a way to increment the i from 1 to 48. I had tried to put all the dataframes in list, but given the way I have regression working I would be processing two lists and that was beyond me.
beta_results <- lapply(seq_along(symbols), function(i,x) {
temp <- as.data.frame(paste0("Block",i))
input <- as.formula(paste("temp$",x, "~ temp$SP500" ))
capm <- lm(input)
coefficients(capm)
})
Code for some example dataframes etc are:
symbols <- c("A", "AAPL", "BRKB")
Block1 to BlockN would take the form of
A AAPL BRKB SP500
2016-04-29 -0.139 0.111 0.122 0.150
2016-05-31 0.071 0.095 0.330 0.200
2016-06-30 -0.042 -0.009 0.230 0.150
2016-07-29 0.090 0.060 0.200 0.100
2016-08-31 0.023 0.013 0.005 0.050
2016-09-30 0.065 0.088 0.002 0.100
Consider a nested lapply
where outer loop iterates through a list of dataframes and inner loop through each symbol. The result is a 48-member list, each containing 470 sets of beta coefficents.
Also, as an aside, it is preferred to use lists of many similiarly structured objects especially to run same operations and avoid flooding your global environment (manage 1 list vs 48 dataframes):
# LIST OF DATA FRAMES FROM ALL GLOBAL VARIABLES CONTAINING "Block"
dfList <- mget(ls(pattern="Block"))
# NESTED LAPPLY
results_list <- lapply(dfList, function(df) {
beta_results <- lapply(symbols, function(x) {
input <- reformulate(quote(SP500), response=x)
capm <- lm(input, data=df)
coefficients(capm)
})
})