Search code examples
rstatisticsaggregationmedian

Calculate median on pre-aggregated data (having means and counts) in R rstats


How can I get a proper median calculation on data that has been already aggregated?

For example, if I have a data frame that looks like this:

> df <- data.frame(name = c("A","B","C","D"), count = c(1,3,5,2), avg = c(100,50,20,10))
> df
# A tibble: 4 × 3
   name count   avg
  <chr> <dbl> <dbl>
1     A     1   100
2     B     3    50
3     C     5    20
4     D     2    10

Assume we don't know much what's inside the bins, but assume there is little variation within bins. To the best of our knowledge, we would line up the values like this:

10 10 20 20 20 20 20 50 50 50 100

Out of 11 values, a median would be the 6th one, which is 20

But if I simply take the median(), R takes it over 4 values: 10, 20, 50, 100

> median(df$avg)
[1] 35

Which is not what I want.

How can I go around this and "unfold" the data set?


Solution

  • It was solved as commented by Zheyuan Li. It is simple, and I'm surprised I didn't know about it.

    with(df, median(rep.int(avg, count)) )