Search code examples
algorithmsampling

"Programming Pearls": Sampling m elements from a sequence


From Programming Pearls: Column 12: A Sample Problem:

The input consists of two integers m and n, with m < n. The output is a sorted list of m random integers in the range 0..n-1 in which no integer occurs more than once. For probability buffs, we desire a sorted selection without replacement in which each selection occurs with equal probability.

The author provides one solution:

initialize set S to empty
size = 0
while size < m do
    t = bigrand() % n
    if t is not in S
        insert t into S
        size++
print the elements of S in sorted order

In the above pseudocode, bigrand() is a function returns a large random integer (much larger than m and n).

Can anyone help me prove the correctness of the above algorithm?

According to my understanding, every output should have the probability of 1/C(n, m). How to prove the above algorithm can guarantee the output with the probability of 1/C(n, m)?


Solution

  • Each solution this algorithm yields is valid.

    How many solutions are there? Up to last line there (sorting) are n*(n-1)*(n-2)*..*(n-m) different permutations or
    n!/(n-m)! and each result has same probability

    When you sort you reduce number of possible solutions by m!.

    So number of possible outputs is n!/((n-m)!*m!) and this is what you asked for.

    n!/((n-m)!m!) = C(n,m)