def geo_dist_names(p, k):
sum = 0
for i in range(1, 4):
sum += p**i
return (p**(1+k))/sum
p is a float between 0 and 1 and k is an int between 0 and 3. The function basically just find the value in a geometric distribution associated with the given p and k and then normalizes this by dividing with the sum of the 4 potential values for k.
It works, but I am calling this function many times so I wondered if there were a more optimized way of performing this operation?
The vectorial version of your code would be:
import numpy as np
def geo_dist_names(p, k):
return (p**(1+k))/(p**np.arange(1,4)).sum()
Yet, I'm not sure that it will be faster than pure python as the range is quite small here, so the overhead of numpy is probably not negligible.
Edit. Indeed, assuming:
def geo_dist_names_python(p, k, N=4):
sum = 0
for i in range(1, N):
sum += p**i
return (p**(1+k))/sum
def geo_dist_names_numpy(p, k, N=4):
return (p**(1+k))/(p**np.arange(1,N)).sum()
numpy is better only when the range increases: