I want to compute w %*% Sigma %*% t(w)
for each row in a particular dataframe, where w
is a row vector in each row of my dataframe and Sigma
is a covariance matrix.
Sigma
changes for each line of the given dataframe. It is calculated based on the date given in column Reset_Date
in my original dataframe. The covariance matrix works, it is calculated correctly in another part of the code. But it does not work inside the apply()
.
apply(df,1, function(x) as.matrix(x[,3:8])%*%
cov(Dados[c((which(Dados$Date==x$Reset_Date)):((which(Dados$Date==x$Reset_Date))-63)),c(2:ncol(Dados))])%*%
t(as.matrix(x[,3:8])))
There must be something wrong with my code, because even the simpler version of calculating the dot product fails. Somehow the result is a matrix, not one scalar for each line.
apply(df[,3:8],1, function(x) as.matrix(x)%*%t(as.matrix(x)))
Any ideas on how to fix either example? Here is some sample data (dataframes Dados
and df
):
Dados=structure(list(Date = structure(c(16905, 16906, 16909, 16910,
16911, 16913, 16916, 16917, 16918, 16919, 16920, 16923, 16924,
16925, 16926, 16927, 16930, 16931, 16932, 16933, 16934, 16937,
16938, 16939, 16940, 16941, 16944, 16945, 16946, 16948, 16951,
16952, 16953, 16954, 16955, 16958, 16959, 16960, 16961, 16962,
16965, 16966, 16967, 16968, 16969, 16972, 16973, 16974, 16975,
16976, 16979, 16980, 16981, 16982, 16983, 16986, 16987, 16988,
16989, 16990, 16993, 16994, 16995, 16996, 16997, 17000, 17001,
17002, 17003, 17004, 17007, 17008, 17009), class = "Date"), c(0.00960585109965884,
0.0812984880734113, 0.112781679522822, -0.0145184885929517, -0.011662089633413,
-0.0327527768959612, 0.0151436007187478, -0.0608315211005128,
0.040412688006497, -0.0879756056911929, 0.00317371425342383,
-0.0734061020879029, 0.0221495449282072, -0.031900508719751,
-0.00909652588569898, -0.0132414400054515, -0.0636499025520987,
0.0595155178929385, 0.0496966022372147, 0.0278539993905147, -0.0232250551043034,
-0.0315789056561488, -0.0571768253829652, -0.0511197886498449,
0.0489696431147379, 0.0178738652271138, -0.0506959500704562,
-0.00254709665075126, -0.0299114881227025, -0.0215640151840946,
0.107550799579403, 0.00145426529312154, 0.0676915429902003, 0.0211475594729338,
0.0643398963306341, 0.00496393940383477, -0.0526404298981098,
0.0380905791373509, -0.0891580558943561, -0.0645347544569685,
0.00570724634156861, -0.0412632543372649, -0.0208006548438888,
-0.0470550159555883, -0.0123849706687551, -0.0023477039443609,
0.0819777903542596, 0.00618852230414024, 0.022977414335057, 0.0280724753787043,
0.0551947631538763, -0.195678423483481, -0.0880383210535407,
-0.0841519282731906, 0.112873303891403, 0.0175334088901247, -0.045942677930122,
-0.0300476169326158, 0.00351042344834962, 0.0389348704162185,
0.00829105248665307, -0.00693793603785409, 0.0280590459741381,
-0.00540102675314902, -0.0239906439583271, 0.0163368695597521,
-0.00674931659354616, 0.0439348031396891, -0.153964767847314,
-0.031463293190126, 0.0069188458542202, -0.0416134046884897,
0.00408029958172129), c(-0.292209663232257, 0.103599520438458,
0.274708851088623, -0.0557109374167419, -0.157957979992429, -0.0554170235205764,
0.113431253533336, -0.0600003781444314, 0.162598822729643, -0.311927817466595,
0.599523149849479, -0.103795741757395, -0.0776634595741843, 0.0269244302550042,
0.217581466695527, -0.0322834660227755, -0.191377271191473, 0.300726294548759,
0.319934388717669, 0.295157465398543, 0.00890284707404376, -0.0641387079437505,
-0.0459754968033854, -0.289985711542395, 0.0722398906096711,
-0.123006711056173, -0.490962832929132, -0.175032264765085, -0.12032435526016,
-0.176141231274118, 0.11500605744208, -0.0923868671712014, 0.190425336380873,
0.277445949794952, 0.406070025160576, 0.0952039477377431, -0.0576003722424168,
0.140203114680371, 0.0197629475892525, -0.0752226248850074, 0.0693752560910754,
0.0455597855569589, -0.102661342015731, -0.179332559157575, -0.11668358170327,
-0.00276900524260881, -0.0112290596820275, -0.0493628339251595,
0.078736802678181, -0.0479679813838452, 0.270953545649233, -0.587305172541941,
-0.621844499591828, -0.166447303595396, 0.469400973311429, -0.120315498643209,
-0.0846596216766748, 0.163283239995615, 0.0301963989209453, 0.135297886837837,
-0.0210178896502544, -0.00776600620899792, 0.0639517026860714,
0.0994514622426612, 0.0804093657452043, -0.0396069299241519,
-0.17970264410071, -0.0720211595746423, -0.309769824186645, -0.0426919498663469,
-0.0413861665933934, -0.0900483966791432, 0.041316666402369),
c(-0.172923456722018, 0.134604714780284, -0.1560279435179,
-0.129325171317674, -0.447953460277239, -0.250739954530577,
-0.133060846029998, -0.200451934763202, 0.292155683756645,
0.0755916721141059, 0.0991565124715654, -0.321059221302833,
0.317447102566271, 0.0541416607111156, 0.16780010277575,
-0.180523755295636, 0.0725824189477198, -0.0474727311916778,
0.0660469446123413, -0.258791216526055, 0.207473892380716,
-0.318984991642168, -0.0847417109554605, -0.714407448278065,
0.201086896294655, -0.0629606726738619, -0.0346745144537142,
-0.127772777391799, -0.14244413702964, 0.165634200853604,
-0.0524648703481123, -0.00501155817775878, -0.107636307277348,
0.158348029790401, 0.554012994905162, -0.152487582847627,
0.0209494458093484, 5.21267740971609e-05, 0.122168764166197,
0.175188759978351, 0.099845530350251, -0.0168977078334631,
0.0635841364313494, 0.122308954985573, -0.321999765736414,
-0.302960538847608, -0.195812633272907, -0.00867287475447842,
-0.377386838928739, 0.844449416990134, 0.665042348450107,
-0.0560141731879904, -0.118387030913281, -0.127366513074922,
0.155434880032068, -0.0524563488240037, 0.533499307196594,
-0.176858024007176, -0.0116683258824191, 0.0569655478134923,
-0.401625823151397, -0.48236119067917, 0.208467649310506,
-0.403011638845796, -0.380580334051539, 0.00128658083677902,
0.091238844006214, -0.195045070922639, 0.0375062080320232,
-0.0658825153824982, -0.0900340515372577, -0.000997745642150605,
0.244937614122231), c(-0.496406493957458, 1.82907214245793,
0.687389929405513, -0.868168973766292, -0.306820341474912,
1.24806375771418, -0.785290365465008, -0.748408280952884,
-0.0439414578156772, -1.15325279172931, -1.14831330200486,
1.34556712459546, 2.14439654026597, -1.07129621359316, 0.168114328681135,
-1.25447291028965, 0.764345101139363, -1.72380844604273,
-0.864886480236537, 0.705645454523873, 1.21877970414914,
-0.453807426490982, -0.103915626802553, 1.66893226342728,
0.456333904610107, -1.31941645733865, 1.16841346197136, 0.196616476842348,
0.455618551464032, 0.225278111059102, -0.789225340102151,
1.1669151547465, -0.857419069552467, -0.46375835280883, -1.64309626794125,
-0.976897895566697, -1.33753794277297, -2.20380877689387,
0.76534622622666, 0.896877396532703, 1.14413915494875, 0.430897580220613,
-1.11470964937711, 0.416358159120978, -1.44082437967862,
-0.771037731706115, -0.346660169353696, -0.315088673574393,
-1.02879378992917, 0.658626287068875, 0.837519010675614,
-2.83708881346894, -2.06085441775258, -1.11120482833581,
0.406117230674652, 1.09657628991393, 1.16335336420141, 0.812029759062316,
0.744500956980687, -1.90823722791477, 0.275859124192412,
-0.603942162238758, -0.107292555769767, -1.17736558795127,
0.63200279455371, -0.453611159666423, 0.371833612317296,
-0.334115289868742, 0.758067227135695, -0.110330422599259,
0.142565898070357, -0.463040430759687, 0.069680935638794),
c(-1.44252854343511, 1.50584329170067, -0.679304963566529,
1.49017730045089, -0.199777580504445, -1.40063647843419,
-2.02982723227936, 2.30147598638455, 2.5760586765625, -0.356830865331259,
-0.791625843271082, -0.699779830023839, -2.48209778254743,
0.507451023252337, -1.73031060711909, 0.0380590482016663,
-1.45960291459708, 4.02840531052773, -0.629898888003211,
0.851315865438118, -2.75153031138621, -0.0551356722263807,
-1.9123482098551, -0.598772542428672, -0.901269186047482,
-0.869854203811926, -0.8415157385536, -0.0225120534220302,
0.226544641604676, -0.924225208065921, -0.230135314202118,
-1.05856046752094, 1.06353418823462, 1.73196386595527, 1.41535767476162,
-0.423274291874653, 0.0587189624158047, 2.20834035330906,
-1.04189900598274, -3.37082319034527, 0.43036818352391, -2.09127622720844,
0.495243057530081, 0.963348655276053, 0.194895155374741,
1.55353832464475, 0.957778107172214, -1.39300452998466, 2.74583010854526,
-2.87356974415359, -1.76832522655593, 1.49292893448334, 1.93799313574115,
0.976947606342438, 1.31788691450525, 0.590087144719553, -1.4342449850373,
0.062377260576274, 0.164986209055984, 2.11246557193216, 1.48942844632503,
0.496632188894508, 0.577669475760856, 1.56402903813502, 0.123037811391691,
1.57762064227671, 0.326145986443338, -0.264116111911472,
0.0596585364206703, 0.584152800203808, -0.279382462032451,
-0.210669397167951, 0.0709602555787558), c(-0.0351709286018131,
-0.150881086035359, 0.601645988249588, 0.255978688101188,
0.0236933704295117, -0.567090501196776, -0.233687453902298,
0.134822177205463, 0.112484065576579, -0.97553415256052,
-0.558778428576845, 0.728533991939528, -0.920135343412154,
-0.646147884809356, -0.0763445803970142, 0.264988189688498,
0.0228996249262536, 1.19589898895847, -1.00861894342493,
-0.0693899810052745, -0.900303057566632, 0.927216320396496,
-0.993604425464922, -0.031929111972917, -0.423137092074644,
0.549488194589487, -0.26102501486347, 1.31568433032141, 0.645025908357066,
0.355081396389956, -0.0524648703481123, -0.152505708458051,
0.060563779407552, 0.230018059927928, -0.343648012783104,
0.437258121587658, 0.0765049489658054, 0.27848497837073,
-0.224249429047296, -0.969974752705105, -0.863990602065634,
-0.232333372512006, -0.236531757215996, 0.260839873862939,
-0.378251169565813, 0.528342245508862, 0.218763512644027,
-0.217625279775147, 1.28393978636681, -3.64443676454954,
-1.86212791696754, 1.72457029956796, 1.65081887178389, 1.30403541749646,
0.14238381012821, -0.0524563488240037, -0.737204478591569,
0.482828504338562, -0.139597650373546, 1.47287231225834,
0.288402824021206, 0.648453062535115, -0.0389746600371721,
0.473448744260674, -0.145358403641205, 0.185774251899384,
-0.195979063698148, 0.374579453017043, -0.413718154457121,
0.402926153900052, -0.353606189676337, -0.0201758132887475,
-0.172336080005542)), class = "data.frame", row.names = c(NA,
73L))
df=structure(list(Reset_Date = structure(c(16996, 16997, 17000,
17001, 17002, 17003, 17004, 17007, 17008, 17009), class = "Date"),
structure(c(16997, 17000, 17001, 17002, 17003, 17004, 17007,
17008, 17009, 17010), class = "Date"), c(1.82637665045793,
0.838634901792124, -0.322817945118891, 0.884944453234911,
0.410642820219065, 0.816526995015299, 1.94844363045823, 0.833011009391704,
1.68127046626407, 1.13013668001142), c(-0.183091416363673,
-0.319511762676758, -0.0316593389910776, 0.259788486653649,
0.492005994872685, 0.201027165479703, 0.0939241784296569,
0.231492377659909, 0.340750751275363, 0.0524028691361074),
c(0.231101296885914, 0.231101296885914, 0.000521792803768906,
0.000521792803768906, 0.0757021972820923, -0.0207214984966843,
-0.256874719990114, -0.0624916417149904, -0.299739460405469,
-0.0599027654003357), c(-0.106863689224328, -0.106863689224328,
-0.146576498534379, -0.146576498534379, -0.146576498534379,
-0.146576498534379, -0.0876257469847474, -0.0546632235816992,
0.00454466736633395, -0.0546303193896886), c(0.110991660181017,
0.152418298691739, 0.152418298691739, 0.136643097053495,
0.162449054526432, 0.197965817610092, 0.155889024964584,
0.113155738867012, 0.113155738867012, 0.0894339689144131),
c(0.0731007117478923, 0.0508244737588497, -0.0236654986334738,
-0.0984261218820228, -0.0984261218820228, -0.173090546127112,
-0.247883779469496, -0.173170115673856, -0.0985569058107293,
-0.0239349811169282)), row.names = c(NA, 10L), class = "data.frame")
First, let me help other readers better understand your problem (which you really failed to clarify).
Dados
is a data frame of 6 variables observed in consecutive days;df
is a data frame of weights (w
) for these variables on consecutive days.(Therefore, if df
has N rows, Dados
should have N + 63 rows.)
For each date in df
, compute:
Sigma
using observations in a 64-day period to date;Sigma
and w
.The following is my attempt to this problem.
## observations in consecutive days
date1 <- Dados$Date
obs <- as.matrix(Dados[, -1])
## weights in consecutive days
date2 <- df$Reset_Date
wts <- as.matrix(df[, 3:8])
## compute the desired quadratic form for the i-th day in `date2`
QuadForm <- function (i) {
## compute a window `Sigma`
Sigma <- cov(obs[i:(i + 63), ])
## extract `w`
w <- wts[i, ]
## compute the desired quadratic form
c(crossprod(w, Sigma %*% w))
}
## loop through days in `date2`
sapply(1:length(date2), QuadForm)
# [1] 0.08429908 0.10472934 0.10985758 0.09600481 0.12414915 0.13099761
# [7] 0.08225033 0.04184155 0.05513798 0.03362151
I was reluctant to fix your code, because doing row-wise operation on a data frame using apply
is a bad practice and is not worth a fix. But if you insist on knowing how to fix it, read on.
apply
is designed for matrices or arrays. If you feed it a data frame, it converts the data frame to a matrix at run time. This implicit conversion can cause you trouble, when your data frame have different data types across its columns.
## columns 3 to 8 are all numeric
## you get a numeric matrix after conversion
as.matrix(df[, 3:8])
#1 1.8263767 -0.1830914 0.2311013 -0.1068637 0.1109917 0.07310071
#2 0.8386349 -0.3195118 0.2311013 -0.1068637 0.1524183 0.05082447
## you can sum numeric values
apply(df[, 3:8], 1, sum)
# 1.9516152 0.8466035 -0.3717792 1.0368952 0.8957974 ...
## dang!! columns 1 to 2 are strings; columns 3 to 8 are numeric
## you get a character matrix after conversion
as.matrix(df)
#1 "2016-07-14" "2016-07-15" " 1.8263767" "-0.18309142" " 0.2311012969" ...
#2 "2016-07-15" "2016-07-18" " 0.8386349" "-0.31951176" " 0.2311012969" ...
## dang!! you can't sum strings
apply(df, 1, sum)
#Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
Whether you do row-wise (MARGIN = 1
) or column-wise (MARGIN = 2
) operations, the function fed to apply()
must work with vectors only. Why? Because extracting a row or a column from a matrix gives you a vector without dimension.
## dang!! can not index an atomic vector by $
apply(df, 1, function (x) substring(x$Reset_Date, 1, 4))
#Error in x$Reset_Date : $ operator is invalid for atomic vectors
## OK
apply(df, 1, function (x) substring(x[1], 1, 4))
#"2016" "2016" "2016" "2016" "2016" ...
## dang!! can not index an atomic vector by [, 1:2]
apply(df[, 3:8], 1, function (x) sum(x[, 1:2]))
#Error in x[, 1:2] : incorrect number of dimensions
## OK
apply(df[, 3:8], 1, function (x) sum(x[1:2]))
# 1.6432852 0.5191231 -0.3544773 1.1447329 0.9026488 ...
In addition, a vector without dimension is treated as a column vector (in the math sense). So even if you think x
is a row-vector, the dot product is in fact t(x) %*% x
instead of x %*% t(x)
.
## dang!! not computing dot product
apply(df[, 3:8], 1, function (x) x %*% t(x))
## this computes dot product
apply(df[, 3:8], 1, function (x) t(x) %*% x)
# 3.4516647 0.8960384 0.1504901 0.9004607 0.4739903 ...
Fully understanding the above issues, your code should be fixed as
## as.matrix(df) is a string matrix
## so the input 'x' will be a string vector
fun <- function (x) {
## 1. x[3:8], not x[, 3:8]
## 2. must as.numeric() or you have a string vector
w <- as.numeric(x[3:8])
pos <- which(Dados$Date == x[1])
span <- pos:(pos - 63)
Sigma <- cov(Dados[span, -1])
## not w %*% Sigma %*% t(w)
t(w) %*% Sigma %*% w
}
apply(df, 1, fun)
#0.08429908 0.10472934 0.10985758 0.09600481 0.12414914 0.13099762 0.08225033
#0.04184155 0.05513799 0.03362152
The result is identical to my proposed solution.
To demonstrate how inefficient your (fixed) code is, let's do a benchmark.
library(microbenchmark)
microbenchmark("me" = sapply(1:length(date2), QuadForm),
"you" = apply(df, 1, fun))
#Unit: microseconds
# expr min lq mean median uq max neval
# me 521.704 542.1485 577.5259 574.4675 597.773 755.889 100
# you 6121.230 6151.9695 6412.4002 6197.5450 6237.770 9448.530 100