I would like to calculate standard deviation of a variable that weighs each observation by a second variable.
values <- c(100, 200, 300, 400, 200)
sizes <- c(12, 54, 1, 218, 63)
How do I find the standard deviation of values
with weighting applied based on sizes
?
That Hmisc
is rather large. In the time it took me to install and load that package, which has multiple dependencies, I did this using base R. First, I had to check the formula from Wiki. https://en.wikipedia.org/wiki/Weighted_arithmetic_mean:
Note that V1 is just the sum of the weights. Then I just converted that into R-speak:
x <- c(100, 200, 300, 400, 200)
w <- c(12, 54, 1, 218, 63)
mu <- weighted.mean(x, w)
sqrt(sum(w * ((x-mu)^2))/(sum(w) - 1))
[1] 102.696
Which agrees with the wtd.var
function from Hmisc
.
Of course, if you want added functionality, like normalisation, maximum likelihood estimation, or removing NA for you, then go with the wtd.var
function. But the OP didn't specify any of that. Also, if your internet speed is slow, or you want to try to do things yourself and learn at the same time, then use my method. :)
Edit 1: And for reliability weights (normwt=TRUE
):
> V1 <- sum(w)
> V2 <- sum(w^2)
> sqrt(sum(w * ((x-mw)^2))/(V1 - V2/V1))
[1] 138.3356
Edit 2: Handling missing values (na.rm=TRUE
):
obs <- !is.na(x) & !is.na(w)
x <- x[obs]
w <- w[obs]
Then use these instead.