Search code examples
pythonnumpyoptimizationdistribution

Optimize trivial summation in calculation of geometric distribution


def geo_dist_names(p, k):
    sum = 0
    for i in range(1, 4):
        sum += p**i   
    return (p**(1+k))/sum

p is a float between 0 and 1 and k is an int between 0 and 3. The function basically just find the value in a geometric distribution associated with the given p and k and then normalizes this by dividing with the sum of the 4 potential values for k.

It works, but I am calling this function many times so I wondered if there were a more optimized way of performing this operation?


Solution

  • The vectorial version of your code would be:

    import numpy as np
    
    def geo_dist_names(p, k):
        return (p**(1+k))/(p**np.arange(1,4)).sum()
    

    Yet, I'm not sure that it will be faster than pure python as the range is quite small here, so the overhead of numpy is probably not negligible.

    Edit. Indeed, assuming:

    def geo_dist_names_python(p, k, N=4):
        sum = 0
        for i in range(1, N):
            sum += p**i   
        return (p**(1+k))/sum
    
    def geo_dist_names_numpy(p, k, N=4):
        return (p**(1+k))/(p**np.arange(1,N)).sum()
    

    numpy is better only when the range increases:

    enter image description here