Search code examples
rstatisticssamplingresampling

How should I specify argument "prob" when using sample() for resampling?


In short

I'm trying to better understand the argument prob as part of the function sample in R. In what follows, I both ask a question, and provide a piece of R code in connection with my question.

Question

Suppose I have generated 10,000 random standard rnorms. I then want to draw a sample of size 5 from this mother 10,000 standard rnorms.

How should I set the prob argument within the sample such that the probability of drawing these 5 numbers from the mother rnorm considers that the middle areas of the mother rnorm are denser but tail areas are thinner (so in drawing these 5 numbers it would draw from the denser areas more frequently than the tail areas)?

x = rnorm(1e4)
sample( x = x, size = 5, replace = TRUE, prob = ? ) ## what should be "prob" here?
# OR I leave `prob` to be the default by not using it: 
sample( x = x, size = 5, replace = TRUE )

Solution

  • Overthinking is devil.

    You want to resample these samples, following the original distribution or an empirical distribution. Think about how an empirical CDF is obtained:

    plot(sort(x), 1:length(x)/length(x))
    

    In other words, the empirical PDF is just

    plot(sort(x), rep(1/length(x), length(x)))
    

    So, we want prob = rep(1/length(x), length(x)) or simply, prob = rep(1, length(x)) as sample normalizes prob internally. Or, just leave it unspecified as equal probability is default.