Search code examples
pythoncachingscipygamma-function

How to efficiently cache and apply scipy gammaln to a numpy array?


I am looking for a way to cache the results of the gammaln function, apply it to a numpy array and then sum the results. Currently I am able to cache the gammln results using a simple dictionary approach, but it seems that I cannot iterate the array efficiently. Below is an illustrative code snipet of the problem.

import numpy as np
from scipy.special import gammaln

gammaln_cache_dic = {}
def gammaln_cache(x):
    if x not in gammaln_cache_dic:
        val = gammaln(x)
        gammaln_cache_dic[x] = val
    else:
        val = gammaln_cache_dic[x]
    return val

def cache_gammaln_sum(M):
    res_sum = 0.0
    for x in np.fromiter( [ x for x in M ], int ):
        res_sum += gammaln_cache(x)
    return res_sum

np_array = np.random.randint(5, size=(500))*1000+3
start = time.time()
for i in np_array:
    gammaln(i)
end = time.time()
print("NO cache gammaln %f secs" % (end - start))

start = time.time()
for i in np_array:
    gammaln_cache(i)
end = time.time()
print("cache gammaln %f secs" % (end - start))

start = time.time()
gammaln(np_array).sum()
end = time.time()
print("NO cache gammaln sum %f secs" % (end - start))

start = time.time()
cache_gammaln_sum(np_array)
end = time.time()
print("cache gammaln sum %f secs" % (end - start))

Solution

  • If a function has been written to take advantage of the numpy compiled code (or itself is compiled), it will be hard to improve on its performance with caching.

    In [1]: from scipy.special import gammaln
    In [2]: np_array = np.random.randint(5, size=(500))*1000+3
    In [3]: np_array.shape
    Out[3]: (500,)
    In [4]: timeit gammaln(np_array).sum()
    ....
    10000 loops, best of 3: 34.9 µs per loop
    

    Evaluating each element will be much slower

    In [5]: timeit sum([gammaln(i) for i in np_array])
    1000 loops, best of 3: 1.8 ms per loop
    

    Simply iterating through the array is slower.

    In [6]: timeit sum([i for i in np_array])
    10000 loops, best of 3: 127 µs per loop
    

    In this case the values are a small set of integers

    In [10]: np.unique(np_array)
    Out[10]: array([   3, 1003, 2003, 3003, 4003])
    

    But identifying those unique values takes nearly as long as calculating the function for all 500.

    In [11]: timeit np.unique(np_array)
    ...
    In [12]: timeit gammaln(np.unique(np_array))
    

    In fact it is slower if I also ask for indices that can reconstruct the array:

    In [14]: timeit np.unique(np_array, return_inverse=True)    
    10000 loops, best of 3: 54 µs per loop
    

    There are caching or memorizing recipes for Python, including a @functools.lru_cache decorator. But I haven't seen them much with numpy arrays. If your problem is suitable for cacheing I suspect it is best to do that in lower level code, such as with Cython, and using existing C or C++ cache libraries.

    Cache with unique

    We could get cache like behavior with unique - use it to get both the unique values and the indices that lets us recreate the original array:

    In [5]: unq, idx=np.unique(np_array, return_inverse=True)
    In [6]: res1= gammaln(np_array)
    In [7]: res2= gammaln(unq)
    In [8]: res2= res2[idx]
    In [9]: np.allclose(res1, res2)
    Out[9]: True
    

    But compare the times:

    In [10]: timeit res1= gammaln(np_array)
    10000 loops, best of 3: 29.1 µs per loop
    
    In [11]: %%timeit
        ...: unq, idx=np.unique(np_array, return_inverse=True)
        ...: res2= gammaln(unq)[idx]
    10000 loops, best of 3: 63.3 µs per loop
    

    Most of this extra time is in calculating unique; once we have the mapping, applying it is quite fast:

    In [12]: timeit res2=gammaln(unq)[idx]
    ...
    100000 loops, best of 3: 6.12 µs per loop
    

    unique with sum

    Returning the sum of values rather than actual values shifts relative times a bit

    In [13]: %timeit res1= gammaln(np_array).sum()
    10000 loops, best of 3: 35.3 µs per loop
    In [14]: %%timeit
        ...: unq, idx=np.unique(np_array, return_inverse=True)
        ...: res2= gammaln(unq)[idx].sum()
    10000 loops, best of 3: 71.3 µs per loop
    

    With Paul Panzer's count approach

    In [21]: %%timeit 
        ...: unq, cnt=np.unique(np_array, return_counts=True)
        ...: res=(gammaln(unq)*cnt).sum()
    10000 loops, best of 3: 59.5 µs per loop
    In [22]: %%timeit 
        ...: unq, cnt=np.unique(np_array, return_counts=True)
        ...: res=np.dot(gammaln(unq),cnt)
    10000 loops, best of 3: 52.1 µs per loop