Search code examples
rloopsregressioncovariancefinance

Looping to construct Covariance Matrix from Regressions


I am trying to construct a covariance matrix that I believe has to be done using a loop.

I have a set of 30 regressions against a single index (DowJones) that creates a table with intercepts (alpha), slopes (beta_i), and standard deviation of residuals (epsilon). I specifically need to construct the matrix σij = βi* βj* σ^2m where βi, βj, etc are the slopes from this table and σ^2m is the variance variable called dji_var . So first slope * first slope * dji_var populates the first element of the covariance matrix.

Does anyone have a loop that can do this easily for me? The dimensions of my covariance matrix should be 30x30.

Thank you

This is what I have so far:

############# Regressing each company’s returns onto the index return #########
#lm(AAPL~DJI), lm(AXP~DJI), lm(BA~DJI), lm(CAT~DJI), lm(CSCO~DJI), lm(CVX~DJI), lm(DD~DJI), lm(DIS~DJI),
#lm(GE~DJI),lm(GS~DJI),lm(HD~DJI),lm(IBM~DJI),lm(INTC~DJI), lm(JNJ~DJI), lm(JPM~DJI), lm(KO~DJI),   
#lm(MCD~DJI), lm(MMM~DJI), lm(MRK~DJI), lm(MSFT~DJI), lm(NKE~DJI), lm(PFE~DJI), lm(PG~DJI), lm(TRV~DJI),
#lm(UNH~DJI),lm(UTX~DJI),lm(V~DJI),lm(VZ~DJI),lm(WMT~DJI), lm(XOM~DJI)

resultdf <- data.frame(matrix(NA,0,4), stringsAsFactors = FALSE)
names(resultdf) <- c("Asset", "Intercept", "Slope", "Std_of_Residuals")
i <-1
for (i in 1:30){
  regression_company_dji <- lm(timeseriesreturns[,i] ~ dji[,1])
  resultdf <- rbind(resultdf, data.frame(Asset= i,
                                         Intercept = regression_company_dji$coefficients[[1]],
                                         Slope= regression_company_dji$coefficients[[2]],
                                         Std_of_Residuals = sd(resid(regression_company_dji)) 
  ))
  #i <- i +1    #DO WE REALLY NEED THIS LINE
}
#prints a table of intercepts, slopes (βi), and idiosyncratic standard deviations σRi (standard deviation of the residuals) 
head(resultdf)
# Asset   Intercept     Slope Std_of_Residuals
#1     1  0.02676350 1.1387824        1.2474725
#2     2 -0.07187497 0.8535259        1.1008612
#3     3  0.06966935 1.0196946        0.9490182
#4     4 -0.12898852 1.0635297        1.2044883
#5     5  0.07498498 1.0600683        0.9935900
#6     6 -0.10309059 1.1483061        1.2779884

#ASSET 1 STARTS WITH AAPL, THEN GOES AXP, BA, CAT, ETC.
#WE HAVE A TABLE OF INTERCEPTS, SLOPES AND STANDARD DEVIATIONS OF RESIDUALS FOR EACH REGRESSION BETWEEN COMPANY RETURNS AND INDEX RETURN

############## Variance of DowJones Index return ######################
dji_var <- var(dji[,1])
#0.8873133
  
######### SINGLE INDEX APPROXIMATION #################################
# In Single-Index Model:
# Intercept = alpha_i
# Slope = beta_i
# Std_of_Residuals = sigma_Ri

# This equation is referred to as the single-index model,
#                   rit = αi + βirmt + εit                                            (8.1)
# αi and βi are the intercept and slope coefficients that result from:
# regressing the rate of return from asset i in period t, denoted rit, onto 
# the simultaneous rate of return on some market index in period t, denoted rmt; and 
# εit is the unexplained residual error term for asset i in period t. 
#########################################################################################
#the Covariance Matrix between two different securities i and j can be expressed as:
# σij = βi* βj* σ^2m.

sigma_squared_market <- dji_var
# [1] 0.8873133

dput(resultdf)

structure(list(Asset = 1:30, Intercept = c(0.0267635033349584, 
-0.0718749662550324, 0.069669346056576, -0.128988516445594, 0.0749849799579864, 
-0.103090590571032, -0.0181204083787094, 0.0940216340701365, 
0.0601045129621876, -0.00712297315161099, 0.100323562649478, 
-0.0517406457596374, 0.012599051698687, -0.0218711039493553, 
0.0263255529821284, 0.0197321609378249, 0.08018398886968, 0.0115659025410572, 
-0.0207922446090187, 0.0629952677099163, 0.137484116508374, 0.0620066345319251, 
-0.0416494718503931, 0.0482722555478251, 0.0886134357472885, 
-0.0240313203975499, 0.142979385201501, -0.0193601624887868, 
-0.107001092634366, -0.0592959645858059), Slope = c(1.13878236093664, 
0.853525869839225, 1.01969460976746, 1.06352969847768, 1.06006825519905, 
1.14830613937928, 1.02057992982579, 0.917124514708528, 1.06521921561495, 
1.16527602124266, 1.01554236848894, 1.05028610720528, 0.99954945490449, 
0.854040163442602, 1.20416480868948, 0.662824098888303, 0.930011492883117, 
0.963949283094558, 0.953009111832057, 1.24362084628936, 0.982034757703831, 
0.885675351438922, 0.766292851924153, 0.873619973887616, 1.03103698221555, 
0.977088962832525, 1.11842324882864, 0.748745167476966, 0.77506736508709, 
1.05126852549869), Std_of_Residuals = c(1.24747249150145, 1.10086122769927, 
0.949018244224872, 1.20448829818015, 0.99358998832754, 1.2779884149182, 
1.13129109038816, 1.03393869712944, 0.999480572360969, 0.690970159142872, 
0.783262244296981, 0.868512560468288, 1.28324642163822, 0.656011164082018, 
0.717972392581603, 0.6617871220526, 0.901244987788103, 0.60700558064988, 
1.02292450298541, 1.34320098732505, 0.961307330185487, 0.815493628199713, 
0.650600876764784, 0.655802004867679, 1.1807969036117, 0.835687577257354, 
1.02888260605468, 0.721618425329537, 1.08066991592903, 0.955080491660557
)), row.names = c(NA, -30L), class = "data.frame")


Solution

  • We can get the outer product of the vector resultdf$Slope with itself, where

    The outer product of the arrays X and Y is the array A with dimension c(dim(X), dim(Y)) where element A[c(arrayindex.x, arrayindex.y)] = FUN(X[arrayindex.x], Y[arrayindex.y], ...).

    (from help("outer")). Here specifically we are interested in the multiplication function for FUN, but you may note for your own future reference that the R command outer() can handle other functions as well. Then we just need to multiply each element by dji_var. The full solution is then

    covmat <- dji_var * outer(resultdf$Slope, resultdf$Slope, FUN = "*")
    

    To see this works as expected, consider the first few rows and columns:

    covmat[1:3, 1:3]
    #          [,1]      [,2]      [,3]
    # [1,] 1.150690 0.8624510 1.0303573
    # [2,] 0.862451 0.6464134 0.7722605
    # [3,] 1.030357 0.7722605 0.9226080
    

    which you can easily confirm are as expected.

    Update

    If you need to add a term to the diagonal (say, for regularization, or some other type of additional noise), you can simply do

    covmat <- dji_var * outer(resultdf$Slope, resultdf$Slope, FUN = "*")
    covmat <- covmat + diag(pi, nrow = nrow(covmat))
    covmat[1:3, 1:3]
    #          [,1]      [,2]      [,3]
    # [1,] 4.292283 0.8624510 1.0303573
    # [2,] 0.862451 3.7880061 0.7722605
    # [3,] 1.030357 0.7722605 4.0642007
    

    (here I used pi as the constant, but it could be whatever. If you need the diagonal added term to be a vector of differing elements, you can do that too:

    covmat <- dji_var * outer(resultdf$Slope, resultdf$Slope, FUN = "*")
    covmat <- covmat + diag(resultdf$Std_of_Residuals^2, nrow = nrow(covmat))
    covmat[1:3, 1:3]
    #          [,1]      [,2]      [,3]
    # [1,] 2.706878 0.8624510 1.0303573
    # [2,] 0.862451 1.8583089 0.7722605
    # [3,] 1.030357 0.7722605 1.8232437