Search code examples
rweighted-average

Weighted average for different columns in R


I want to calculate the weighted average for different columns in a data frame This is my data:

x.4 <- c(2,3,4,5)
a.4 <- c(2,3,4,5)
x.8 <- c(3,24,2,2)
a.8 <- c(2,3,4,7)
x.12 <- c(3,2,4,5)
a.12 <- c(3,2,4,5)
x.24 <- c(2,4,5,2)
a.24 <- c(2,4,5,2)
x.36 <- c(2,1,3,6)
a.36 <- c(2,4,5,2)
x.50 <- c(2,3,5,2)
a.50 <- c(2,3,5,20)
x.100 <- c(2,3,4,5)
a.100 <- c(2,3,4,5)
x.10000 <- c(2,3,46,2)
a.10000 <- c(2,32,46,2)


name_x <- c("a", "b", "c", "d")
df <- data.frame(name_x, x.4,a.4, x.8, a.8, x.12,a.12,x.24,a.24,x.36,a.36,x.50, a.50,x.100,a.100,x.10000, a.10000)

What I want is to create an "x" and "a" variable containing the weighted average using 8 for those that end on 4, 7 for those that end on 8, 6 for those that end on 12, and so on.

x = (x.4 * 8 + x.8 * 7 + x.12 * 6 + x.24 * 5 .......x.10000 * 1)/36
a = (a.4 * 8 + a.8 * 7 + a.12 * 6 + a.24 * 5 .......a.10000 * 1)/36

I create a variable with each of the weighting values multiply by my columns and then I divided but it takes a lot of time. Is there a more clean approach to calculate the weighted average in this situation?


Solution

  • Piece-wise, you can use

    Xs <- grep("^x\\.", names(df))
    Xs_seq <- rev(seq_along(Xs))
    as.matrix(df[Xs]) %*% matrix(Xs_seq, ncol = 1) / sum(Xs_seq)
    #          [,1]
    # [1,] 2.361111
    # [2,] 6.833333
    # [3,] 4.888889
    # [4,] 3.777778
    
    As <- grep("^a\\.", names(df))
    As_seq <- rev(seq_along(As))
    as.matrix(df[As]) %*% matrix(As_seq, ncol = 1) / sum(As_seq)
    #          [,1]
    # [1,] 2.166667
    # [2,] 3.888889
    # [3,] 5.500000
    # [4,] 5.805556
    

    If your pattern of column names is consistent (such as "single letter, a period, then numbers), then

    Z <- unique(gsub("\\..*", "", grep("^.\\.[0-9]+$", names(df), value = TRUE)))
    Z
    # [1] "x" "a"
    
    lapply(setNames(nm = Z), function(z) {
      Zs <- grep(paste0("^", z, "\\."), names(df))
      Zs_seq <- rev(seq_along(Zs))
      as.matrix(df[Zs]) %*% matrix(Zs_seq, ncol = 1) / sum(Zs_seq)
    })
    # $x
    #          [,1]
    # [1,] 2.361111
    # [2,] 6.833333
    # [3,] 4.888889
    # [4,] 3.777778
    # $a
    #          [,1]
    # [1,] 2.166667
    # [2,] 3.888889
    # [3,] 5.500000
    # [4,] 5.805556