I'm trying to better understand the argument prob
as part of the function sample
in R. In what follows, I both ask a question, and provide a piece of R code in connection with my question.
Suppose I have generated 10,000
random standard rnorm
s. I then want to draw a sample of size
5
from this mother 10,000
standard rnorm
s.
How should I set the prob
argument within the sample
such that the probability of drawing these 5
numbers from the mother rnorm
considers that the middle areas of the mother rnorm
are denser but tail areas are thinner (so in drawing these 5 numbers it would draw from the denser areas more frequently than the tail areas)?
x = rnorm(1e4)
sample( x = x, size = 5, replace = TRUE, prob = ? ) ## what should be "prob" here?
# OR I leave `prob` to be the default by not using it:
sample( x = x, size = 5, replace = TRUE )
Overthinking is devil.
You want to resample these samples, following the original distribution or an empirical distribution. Think about how an empirical CDF is obtained:
plot(sort(x), 1:length(x)/length(x))
In other words, the empirical PDF is just
plot(sort(x), rep(1/length(x), length(x)))
So, we want prob = rep(1/length(x), length(x))
or simply, prob = rep(1, length(x))
as sample
normalizes prob
internally. Or, just leave it unspecified as equal probability is default.