r median frequency-distribution quartile

Use R to calculate median without replicating elements

I have a frequency distribution with huge numbers. I want to calculate median and quartiles but R complains. Here is what is working for small numbers:

> TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(48,0,192,1152,5664,23040,77952,214272,423984,558720,267840,0,0,0,0,0,0,0,0))
> summary(rep(TABLE$DAT,TABLE$F))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   9.000  10.000   9.397  10.000  11.000

Here is, what I get for huge numbers:

> TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(240,0,1200,9600,69600,470400,2992800,17859840,98312880,489292800,2164619760,8325820800,26865302400,68711068800,128967422400,153763315200,96770419200,26824089600,2395008000))
> summary(rep(TABLE$DAT,TABLE$F))
Error in rep(TABLE$DAT, TABLE$F) : invalid 'times' argument
In addition: Warning message:
In summary(rep(TABLE$DAT, TABLE$F)) :
  NAs introduced by coercion to integer range

This error does not surprise me because using "rep" I wanted to create an enormous vector. But I do not know, how to avoid this and calculate the median and the quartiles.

Solution

Rather than trying to replicate that monster to use summary() you can get "weighted quantiles". This post has a formula. But as with most things, once you know the right terms you can find a package that already does the work!

#install.packages("Hmisc")

TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(240,0,1200,9600,69600,470400,2992800,17859840,98312880,489292800,2164619760,8325820800,26865302400,68711068800,128967422400,153763315200,96770419200,26824089600,2395008000))


Hmisc::wtd.quantile(TABLE$DATA, probs = c(0.25, 0.5, 0.75), weight = TABLE$F)
#> 25% 50% 75% 
#>  15  16  16

Created on 2018-04-06 by the reprex package (v0.2.0).