Search code examples
rfor-loopregressionbetarollapply

Roll regression for 4 years of daily data which moves one month ahead for each new regression and for different dependent variables


I have 5 independent variables (Columns B-F in attached data) and some dependent variables (columns G-M in the attached data) and I need to do multiple regressions for each of the dependent variable against all independent ones. The regressions have to have a window of 4 years of data and they have to move one month ahead for each new estimation. I need to extract the coefficients and make vasicek adjustment for each one (except the intercept). That adjustment is just:

adjustment of betas

The data looks like

Data organization

And the whole data is:

Data.xls

Where independent variables are placed in columns B-F and dependent variables are placed in columns G-M. I have being struggling with this problem and I have built two parts of code. First I reached to extract coefficients for regressions of each dependent variable and adjusted them according to vasicek adjustment but without taking the mobile windows I need:

depvar <- c("LYB_UN_Equity" ,"AXP_UN_Equity", "VZ_UN_Equity", "AVGO_UW_Equity", "BA_UN_Equity", "CAT_UN_Equity", "JPM_UN_Equity")
regresults <- lapply(depvar, function(dv) {
tmplm <- lm(get(dv) ~ Mkt + SMB + HML + RMW + CMA, data=newdata
#     ,subset=(Newdata$Fecha > "1996-01-01" & Newdata$Fecha < "1999-12-31"), na.action = na.exclude )
 k=length(tmplm$cofficients)-1
 SSE=sum(tmplm$residuals**2)
 n=length(tmplm$residuals)
 SE=sqrt(SSE/(n-(1-k))
coef(tmplm)*(summary(tmplm)$coef[,2]/SE+summary(tmplm)$coef[,2]) +coef(tmplm)*(SE/SE+summary(tmplm)$coef[,2])
})


allresults <- data.frame(depvar = depvar, 
                     do.call(rbind, regresults))
names(allresults)[2] <- "intercept"
allresults}

It worked, but as I said, I need rolling windows of 4 years of daily data which moves one month ahead for each new estimation so I tried using nested for loop and it did not work:

for (j in 1:7) {
for (i in 1:length(newdata)) {
 #try(
  Model<-lm(newdata[seq(i,1056,24),j+6] ~ newdata[seq(i,1056,24), 2:6])
  #, silent=T)
betas <- as.matrix(coefficients(Model))
}}

The error is:

Error in model.frame.default(formula = newdata[seq(i, 1056, 24), j + 6] ~  :  invalid type (list) for variable 'newdata[seq(i, 1056, 24), j + 6]'

I am a beginner and I really appreciate your help


Solution

  • There isn't enough data in the question to run 4 years and the values of the dependent variables are missing so here is a simplified example using a w of 3 months (rather than 4 years) and a simplified set of statistics that can be adapted by changing the inputs and reg.

    Note that yearmon class stores dates consisting of only year and month as year + fraction where fraction = 0, 1/12, ..., 11/12 for Jan, Feb, ..., Dec so the length of an interval of w months is w/12.

    library(zoo)
    
    # inputs
    set.seed(123)
    ndata <- data.frame(date = as.Date("2000-01-01") + 0:365, 
      z = rnorm(366))
    A <- sqrt(0:365)
    B <- (0:365)^0.25
    w <- 3 # number of trailing months to regress over
    depvars <- c("A", "B")
    indep <- c("date", "z")
    
    reg <- function(ym_, depvar, indep, data, w, ym) {
      ok <- ym > ym_ - w/12 & ym <= ym_
      fo <- reformulate(indep, depvar)
      fm <- lm(fo, data, subset = ok)
      co <- coef(fm)
      n <- nobs(fm)
      c(co, n = n)
    }
    
    ym <- as.yearmon(ndata$date)
    ym_u <- tail(unique(ym), -(w-1))
    
    L <- Map(function(depvar) {
      data.frame(yearmon = ym_u, 
        t(sapply(ym_u, reg, 
          depvar = depvar, indep = indep, data = ndata, w = w, ym = ym)), 
        check.names = FALSE)
    }, depvars)
    
    L
    

    giving the following list of data frames where yearmon is the year and month of the last month in the w month period over which the regression is performed and n is the number of days in that period.

    $A
        yearmon (Intercept)       date             z  n
    1  Mar 2000   -931.0836 0.08520186 -3.783475e-02 91
    2  Apr 2000   -645.7504 0.05930666  5.638294e-03 90
    3  May 2000   -536.6141 0.04942836  3.528984e-03 92
    4  Jun 2000   -468.3192 0.04326379 -6.769498e-03 91
    5  Jul 2000   -420.6956 0.03897671 -7.307754e-05 92
    6  Aug 2000   -384.5289 0.03573000  1.343427e-03 92
    7  Sep 2000   -356.8805 0.03325475 -1.272157e-03 92
    8  Oct 2000   -333.4633 0.03116400  1.980825e-03 92
    9  Nov 2000   -314.3980 0.02946651  2.223839e-04 91
    10 Dec 2000   -298.0596 0.02801567 -2.949753e-04 92
    
    $B
        yearmon (Intercept)        date             z  n
    1  Mar 2000  -206.66238 0.019006840 -7.802128e-03 91
    2  Apr 2000  -110.66468 0.010294703  1.301456e-03 90
    3  May 2000   -83.11581 0.007801199  8.920903e-04 92
    4  Jun 2000   -67.34099 0.006377318 -1.520903e-03 91
    5  Jul 2000   -57.03138 0.005449255 -1.435477e-05 92
    6  Aug 2000   -49.58352 0.004780660  2.702669e-04 92
    7  Sep 2000   -44.11908 0.004291454 -2.438281e-04 92
    8  Oct 2000   -39.65054 0.003892493  3.683646e-04 92
    9  Nov 2000   -36.12215 0.003578342  4.162776e-05 91
    10 Dec 2000   -33.18009 0.003317091 -5.103712e-05 92
    

    or if a data frame is preferred then:

    dplyr::bind_rows(L, .id = "depvar")
    

    giving:

       depvar  yearmon (Intercept)        date             z  n
    1       A Mar 2000  -931.08360 0.085201863 -3.783475e-02 91
    2       A Apr 2000  -645.75036 0.059306657  5.638294e-03 90
    3       A May 2000  -536.61413 0.049428357  3.528984e-03 92
    4       A Jun 2000  -468.31918 0.043263786 -6.769498e-03 91
    5       A Jul 2000  -420.69558 0.038976709 -7.307754e-05 92
    6       A Aug 2000  -384.52887 0.035729997  1.343427e-03 92
    7       A Sep 2000  -356.88052 0.033254748 -1.272157e-03 92
    8       A Oct 2000  -333.46329 0.031163998  1.980825e-03 92
    9       A Nov 2000  -314.39800 0.029466506  2.223839e-04 91
    10      A Dec 2000  -298.05960 0.028015670 -2.949753e-04 92
    11      B Mar 2000  -206.66238 0.019006840 -7.802128e-03 91
    12      B Apr 2000  -110.66468 0.010294703  1.301456e-03 90
    13      B May 2000   -83.11581 0.007801199  8.920903e-04 92
    14      B Jun 2000   -67.34099 0.006377318 -1.520903e-03 91
    15      B Jul 2000   -57.03138 0.005449255 -1.435477e-05 92
    16      B Aug 2000   -49.58352 0.004780660  2.702669e-04 92
    17      B Sep 2000   -44.11908 0.004291454 -2.438281e-04 92
    18      B Oct 2000   -39.65054 0.003892493  3.683646e-04 92
    19      B Nov 2000   -36.12215 0.003578342  4.162776e-05 91
    20      B Dec 2000   -33.18009 0.003317091 -5.103712e-05 92
    

    Note

    I am not clear on the intention of the statistics calculations in the question. I did find the formula at the top of page 8 of this document but it seems to vary from the one mentioned in the question. At any rate, at the very least it seems that the code in the question needs to square certain items that were not squared and note that coef(fm), sigma(fm) and diag(vcov(fm)) are the coefficients, residual standard error and coefficient standard errors squared.