Search code examples
mathoptimizationpseudocode

Turning a random number generating loop into a one equation?


Here's some pseudocode:

count = 0
for every item in a list
    1/20 chance to add one to count

This is more or less my current code, but there could be hundreds of thousands of items in that list; therefore, it gets inefficient fast. (isn't this called like, 0(n) or something?)

Is there a way to compress this into one equation?


Solution

  • Let's look at the properties of the random variable you've described. Quoting Wikipedia:

    The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

    Let N be the number of items in the list, and C be a random variable that represents the count you're obtaining from your pseudocode. C will follow a binomial probability distribution (as shown in the image below), with p = 1/20:

    enter image description here

    The remaining problem is how to efficently poll a random variable with said probability distribution. There are a number of libraries that allow you to draw samples from random variables with a specified PDF. I've never had to implement it myself, so I don't exactly know the details, but many are open source and you can refer to the implementation for yourself.

    Here's how you would calculate count with the numpy library in Python:

    n, p = 10, 0.05                  # 10 trials, probability of success is 0.05
    count = np.random.binomial(n, p) # draw a single sample