Search code examples
algorithmrandomreservoir-sampling

Reservoir sampling


To retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code?


Solution

  • I actually did not realize there was a name for this, so I proved and implemented this from scratch:

    def random_subset(iterator, K):
        result = []
        N = 0
    
        for item in iterator:
            N += 1
            if len(result) < K:
                result.append(item)
            else:
                s = int(random.random() * N)
                if s < K:
                    result[s] = item
    
        return result
    

    From: http://web.archive.org/web/20141026071430/http://propersubset.com:80/2010/04/choosing-random-elements.html

    With proof near the end.