Search code examples
pythoncythonpython-itertools

Speeding up itertools combinations with cython


I have the following code to generate all possible combinations in specified range using itertools but I cant get any speed improvements from using the code with cython. Original code is this:

from itertools import *

def x(e,f,g):
    a=[]        
    for c in combinations(range(e, f),g):
        d = list((c))
        a.append(d) 

and after declaring types for cython:

from itertools import *

cpdef x(int e,int f,int g):        
    cpdef tuple c
    cpdef list a
    cpdef list d

    a=[]        
    for c in combinations(range(e, f),g):    
        d = list((c))
        a.append(d)

I saved the latter as test_cy.pyx and compiled using cythonize -a -i test_cy.pyx

After compiling, I created a new script with the following code and ran it:

import test_cy

test_cy.x(1,45,6) 

I didnt get any significant speed improvement, still took about the same time as the original script, about 10.8 sec.

Is there anything I did wrong or is itertools already so optimised that there cant be any bigger improvements to its speed?


Solution

  • As already pointed out in the comments, you should not expect cython to speed-up your code because the most of the time the algorithm spends in itertools and creation of lists.

    Because I'm curios to see how itertools's generic implementation fares against old-school-tricks, let's take a look at this Cython implementation of "all subsets k out of n":

    %%cython
    
    ctypedef unsigned long long ull
    
    cdef ull next_subset(ull subset):
       cdef ull smallest, ripple, ones
       smallest = subset& -subset      
       ripple = subset + smallest    
       ones = subset ^ ripple    
       ones = (ones >> 2)//smallest 
       subset= ripple | ones    
       return subset
    
    
    cdef subset2list(ull subset, int offset, int cnt):  
        cdef list lst=[0]*cnt #pre-allocate
        cdef int current=0;
        cdef int index=0
        while subset>0:
            if((subset&1)!=0):
                lst[index]=offset+current
                index+=1
            subset>>=1
            current+=1
        return lst
    
    def all_k_subsets(int start, int end, int k):
        cdef int n=end-start      
        cdef ull MAX=1L<<n;
        cdef ull subset=(1L<<k)-1L;
        lst=[]
        while(MAX>subset):
             lst.append(subset2list(subset, start, k))
             subset=next_subset(subset)
        return lst
    

    This implementation uses some well-known bit-tricks and has the limitation, that it only works for at most 64 elements.

    If we compare both approaches:

    >>> %timeit x(1,45,6)
    2.52 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    >>> %timeit all_k_subsets(1,45,6)
    1.29 s ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    The speed-up of factor 2 is quite disappointing.

    However, the bottle-neck is the creation of the lists and not the calculation itself - it is easy to check, that without list creation the calculation would take about 0.1 seconds.

    My take away from it: if you are serious about speed you should not create so many lists but proceed the subset on the fly (best in cython) - a speed-up of more than 10 is possible. If it is a must to have all subsets as lists, so you cannot expect a huge speed-up.