Search code examples
javaalgorithmrandomreservoir-sampling

Reservoir Sampling Algorithm


I want to understand the reservoir sampling algorithm where we select k elements out of the given set of S elements such that k <= S.

In the algorithm given on wiki:

array R[k];    // result
integer i, j;

 // fill the reservoir array

for each i in 1 to k do
    R[i] := S[i]
done;

// replace elements with gradually decreasing probability

for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]
    fi
done

If I understand this correctly, we first select k elements from the set and then continuously parse i elements of S, generate the random no j in the range 1 to i and replace the element j with S[i].

It looks fine if the set K to be sampled is very large, but if I want to pick just 1 element from a linked list of infinite size(at least unknown size) at random, how will I do it with this algorithm...?


Solution

  • The reservoir sampling algorithm works on any sized linked list, even one whose length is unknown in advance. In fact, one of the main selling points of reservoir sampling is that it works on data streams whose size is not known in advance.

    If you set k = 1 and then run the normal reservoir sampling algorithm, then you should correctly get a uniformly random element from the list.

    Hope this helps!