Search code examples
medianweighted

What's the difference between these two methods for calculating a weighted median?


I'm trying to calculate a weighted median, but don't understand the difference between the following two methods. The answer I get from weighted.median() is different from (df, median(rep(value, count))), but I don't understand why. Are there many ways to get a weighted median? Is one more preferable over the other?

df = read.table(text="row  count value

1             1.                      25.
2             2.                      26.
3             3.                      30.
4             2.                      32.
5             1.                      39.", header=TRUE)


# weighted median
with(df, median(rep(value, count)))
# [1] 30

library(spatstat)
weighted.median(df$value, df$count)
# [1] 28

Solution

  • Note that with(df, median(rep(value, count))) only makes sense for weights which are positive integers (rep will accept float values for count but will coerce them to integers). This approach is thus not a full general approach to computing weighted medians. ?weighted.median shows that what the function tries to do is to compute a value m such that the total weight of the data below m is 50% of the total weight. In the case of your sample, there is no such m that works exactly. 28.5% of the total weight of the data is <= 26 and 61.9% is <= 30. In a case like this, by default ("type 2") it averages these 2 values to get the 28 that is returned. There are two other types. weighted.median(df$value,df$count,type = 1) returns 30. I am not completely sure if this type will always agree with your other approach.