Search code examples
rrandomdistributionsampleweighted

R - random numbers in a similar distribution to real numbers


This is a very simplified example but hopefully it gives everyone an idea what I'm talking about:

real.length = c(10,11,12,13,13,13,13,14,15,50)

random.length = vector() 
for (i in 1:length(real.length)){
    random.length[i] = sample(min(real.length):max(real.length),1)
}

(NB: I know I could just say random.length=sample(min:max,10) but I need the loop in my real code.)

I would like my random lengths to have a similar range to my real lengths, but also a similar distribution. I've tried rnorm but my real data doesn't have a normal distribution so I don't think that will work unless there's some options I've missed.

Is it possible to set the sample function's prob using my real data? So in this case give a higher weight/probability of a number between 10-15 and a lower weight/probability of a high number like 50.

EDIT: Using James's solution:

samples = length(real.length) 
d = density(real.length)
random.length = d$x[findInterval(runif(samples+100),cumsum(d$y)/sum(d$y))]
random.length = subset(random.length, random.length>0)
random.length = random.length[1:samples]

Solution

  • You can create a density estimate and sample from that:

    d <- density(real.length)
    d$x[findInterval(runif(6),cumsum(d$y)/sum(d$y))]
    [1] 13.066019 49.591973  9.636352 15.209561 11.951377 12.808794
    

    Note that this assumes that your variable is continuous, so round as you see fit.