Search code examples
rstatisticsstandard-deviationvarianceweighted

Standard Deviation of One Variable Weighted By Second Variable


I would like to calculate standard deviation of a variable that weighs each observation by a second variable.

values <- c(100, 200, 300, 400, 200)
sizes <- c(12, 54, 1, 218, 63)

How do I find the standard deviation of values with weighting applied based on sizes?


Solution

  • That Hmisc is rather large. In the time it took me to install and load that package, which has multiple dependencies, I did this using base R. First, I had to check the formula from Wiki. https://en.wikipedia.org/wiki/Weighted_arithmetic_mean:

    enter image description here

    Note that V1 is just the sum of the weights. Then I just converted that into R-speak:

    x <- c(100, 200, 300, 400, 200)
    w <- c(12, 54, 1, 218, 63)
    mu <- weighted.mean(x, w)
    
    sqrt(sum(w * ((x-mu)^2))/(sum(w) - 1))
    [1] 102.696
    

    Which agrees with the wtd.var function from Hmisc.

    Of course, if you want added functionality, like normalisation, maximum likelihood estimation, or removing NA for you, then go with the wtd.var function. But the OP didn't specify any of that. Also, if your internet speed is slow, or you want to try to do things yourself and learn at the same time, then use my method. :)

    Edit 1: And for reliability weights (normwt=TRUE):

    > V1 <- sum(w)
    > V2 <- sum(w^2)
    > sqrt(sum(w * ((x-mw)^2))/(V1 - V2/V1))
    [1] 138.3356
    

    Edit 2: Handling missing values (na.rm=TRUE):

    obs <- !is.na(x) & !is.na(w)
    x <- x[obs]
    w <- w[obs]
    

    Then use these instead.