Search code examples
algorithmmathrandomstatisticsprobability

Select k random elements from a list whose elements have weights


Selecting without any weights (equal probabilities) is beautifully described here.

I was wondering if there is a way to convert this approach to a weighted one.

I am also interested in other approaches as well.

Update: Sampling without replacement


Solution

  • I know this is a very old question, but I think there's a neat trick to do this in O(n) time if you apply a little math!

    The exponential distribution has two very useful properties.

    1. Given n samples from different exponential distributions with different rate parameters, the probability that a given sample is the minimum is equal to its rate parameter divided by the sum of all rate parameters.

    2. It is "memoryless". So if you already know the minimum, then the probability that any of the remaining elements is the 2nd-to-min is the same as the probability that if the true min were removed (and never generated), that element would have been the new min. This seems obvious, but I think because of some conditional probability issues, it might not be true of other distributions.

    Using fact 1, we know that choosing a single element can be done by generating these exponential distribution samples with rate parameter equal to the weight, and then choosing the one with minimum value.

    Using fact 2, we know that we don't have to re-generate the exponential samples. Instead, just generate one for each element, and take the k elements with lowest samples.

    Finding the lowest k can be done in O(n). Use the Quickselect algorithm to find the k-th element, then simply take another pass through all elements and output all lower than the k-th.

    A useful note: if you don't have immediate access to a library to generate exponential distribution samples, it can be easily done by: -ln(rand())/weight