Search code examples
rdata.tableweighted-average

Aggregate and Weighted Mean for multiple columns in R


The question is basically the samt as this: Aggregate and Weighted Mean in R.

But i want it to compute it on several columns, using data.table, as I have millions of rows. So something like this:

set.seed(42)   # fix seed so that you get the same results
dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                                    tax=rnorm(20),tax2=rnorm(20), assets=1e7+1e7*runif(20), assets2=1e6+1e7*runif(20))

DT <- data.table(dat)

I can compute the weighted mean on one column, assets, like this:

DT[,list(wret = weighted.mean(tax,assets)),by=assetclass]

But how to do it on both assets and assets2?
What if there are several columns, like col=c("assets1", "assets2", "assets3", ... )? And is it also possible to do it for tax, tax1...


Solution

  • So you can do it for several columns of weights

    DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                      tax=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
    DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=3:4]
    #    assetclass      assets       asets2
    # 1:          D -0.14179882 -0.003717957
    # 2:          B  0.61146928  0.523913589
    # 3:          E -0.28037796 -0.147677384
    # 4:          C -0.09658125 -0.010338894
    # 5:          A  0.74954460  0.750190947
    

    or you can exclude the non-weight columns from .SD:

    DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=-(1:2)]
    

    Here is a variant using matrix multiplication:

    DT[, as.list(crossprod(as.matrix(.SD), tax)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
    

    The matrix multiplication can do it also for several columns tax1, tax2, ...

    DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                     tax1=rnorm(20), tax2=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
    DT[, as.list(crossprod(as.matrix(.SD), tax1)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
    DT[, as.list(crossprod(as.matrix(.SD), tax2)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
    DT[, as.list(crossprod(as.matrix(.SD), cbind(tax1, tax2))/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]