I want to understand the reservoir sampling algorithm where we select k elements out of the given set of S elements such that k <= S.
In the algorithm given on wiki:
array R[k]; // result
integer i, j;
// fill the reservoir array
for each i in 1 to k do
R[i] := S[i]
done;
// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
j := random(1, i); // important: inclusive range
if j <= k then
R[j] := S[i]
fi
done
If I understand this correctly, we first select k elements from the set and then continuously parse i elements of S, generate the random no j in the range 1 to i and replace the element j with S[i].
It looks fine if the set K to be sampled is very large, but if I want to pick just 1 element from a linked list of infinite size(at least unknown size) at random, how will I do it with this algorithm...?
The reservoir sampling algorithm works on any sized linked list, even one whose length is unknown in advance. In fact, one of the main selling points of reservoir sampling is that it works on data streams whose size is not known in advance.
If you set k = 1 and then run the normal reservoir sampling algorithm, then you should correctly get a uniformly random element from the list.
Hope this helps!