Search code examples
pythontime-complexitycomplexity-theoryspace-complexity

How do you take randomness into account when finding the complexity of a function?


I've been trying to understand complexities, but all the online material has left me confused. Especially the part where they create actual mathematical functions. I have a for loop and a while loop. My confusion arises from the while loop. I know the complexity of the for loop is O(n), but the while loop is based on randomness. A random number if picked, and if this number is not in the list, it is added and the while loop broken. But my confusion arises here, the while loop may run in the worst case (in my thoughts) for an m number of times, until it is done. So I was thinking the complexity would then be O(n*m)?

I'm just really lost, and need some help.


Solution

  • Technically worst-case complexity is O(inf): random.randint if we consider it real random generator (it isn't, of course) can produce arbitrary long sequence with equal elements. However, we can estimate "average-case" complexity. It isn't the real average-case complexity (best, worst and average cases must be defined by input, not randomly), but it can show how many iterations program will do, if we run it for fixed n multiple times and take average of results.

    Let's note that the list works as set here (you never add repeated number), so I'd stick with not in set comparison instead which is O(1) (while not in list is O(i)) to remove that complexity source and simplify things a bit: now count of iterations and complexity can be estimated with same big O limits. Single trial here is choosing from uniform integer distribution on [1; n]. Success is choosing number that is not in the list yet.

    Then what's expected value of number of trials before getting item that is not in the set? Set size before each step is i in your code. We can pick any of n-i numbers. Thus probability of success is p_i = (n-i)/n (as the distribution is uniform). Every outer iteration is an example of geometrical distribution: count of trials before first success. So estimated count of while iterations is n_i = 1 / p_i = n / (n-i). To get final complexity we should sum this counts for each for iteration: sum(n_i for i in range(n)). This is obviously equal to n * Harmonic(n), where Harmonic(n) is n-th harmonic number (sum of first n reciprocals to natural numbers). Harmonic(n) ~ O(log n), thus "average-case" complexity of this code is O(n log n).

    For list it will be sum(i*n / (n-i) for i in range(n)) ~ O(n^2 log(n)) (proof of this equality will be a little longer).