I have the following data set, with numerosity referred to (different sized) intervals:
Income Numerosity
from 6000 to 7500 704790
from 7500 to 10000 1294784
from 10000 to 12000 1051902
from 12000 to 15000 1585132
from 15000 to 20000 704012
from 20000 to 25000 206901
from 25000 to 30000 156661
I'd like to obtain an (approximated) data set as follows:
Income Numerosity
6000 ...
7000 ...
8000 ...
... ...
30000 ...
To this aim, I tried the following: first I used sample(6000:7500, 704790, replace=TRUE)
for each row and concatenated results to create a vector rpop
of generated observation. Then, I applied the function density
(I tried different values of the parameter bw
to smooth the distribution)
d=density(rpop,bw=2000,from=6000,to=30000,n=25)
d$x
gives the required income levels, while numerosities are proportional to d$y
However, I wonder if there are better (more direct or elegant) ways to obtain the same result.
The approx
function is meant for this kind of interpolation.
Example:
> d <- read.table(header=T, text="Income Numerosity
+ 6000 704790
+ 7500 1294784
+ 10000 1051902
+ 12000 1585132
+ 15000 704012
+ 20000 206901
+ 25000 156661")
> res <- approx(d$Income, d$Numerosity, seq(from=6000, to=30000, length.out=25))
> res
$x
[1] 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000
[13] 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000
[25] 30000
$y
[1] 704790.0 1098119.3 1246207.6 1149054.8 1051902.0 1318517.0 1585132.0
[8] 1291425.3 997718.7 704012.0 604589.8 505167.6 405745.4 306323.2
[15] 206901.0 196853.0 186805.0 176757.0 166709.0 156661.0 NA
[22] NA NA NA NA