Speeding up itertools combinations with cython

I have the following code to generate all possible combinations in specified range using itertools but I cant get any speed improvements from using the code with cython. Original code is this:

from itertools import *

def x(e,f,g):
    a=[]        
    for c in combinations(range(e, f),g):
        d = list((c))
        a.append(d)

and after declaring types for cython:

from itertools import *

cpdef x(int e,int f,int g):        
    cpdef tuple c
    cpdef list a
    cpdef list d

    a=[]        
    for c in combinations(range(e, f),g):    
        d = list((c))
        a.append(d)

I saved the latter as test_cy.pyx and compiled using cythonize -a -i test_cy.pyx

After compiling, I created a new script with the following code and ran it:

import test_cy

test_cy.x(1,45,6)

I didnt get any significant speed improvement, still took about the same time as the original script, about 10.8 sec.

Is there anything I did wrong or is itertools already so optimised that there cant be any bigger improvements to its speed?

Solution

As already pointed out in the comments, you should not expect cython to speed-up your code because the most of the time the algorithm spends in itertools and creation of lists.

Because I'm curios to see how itertools's generic implementation fares against old-school-tricks, let's take a look at this Cython implementation of "all subsets k out of n":

%%cython

ctypedef unsigned long long ull

cdef ull next_subset(ull subset):
   cdef ull smallest, ripple, ones
   smallest = subset& -subset      
   ripple = subset + smallest    
   ones = subset ^ ripple    
   ones = (ones >> 2)//smallest 
   subset= ripple | ones    
   return subset


cdef subset2list(ull subset, int offset, int cnt):  
    cdef list lst=[0]*cnt #pre-allocate
    cdef int current=0;
    cdef int index=0
    while subset>0:
        if((subset&1)!=0):
            lst[index]=offset+current
            index+=1
        subset>>=1
        current+=1
    return lst

def all_k_subsets(int start, int end, int k):
    cdef int n=end-start      
    cdef ull MAX=1L<<n;
    cdef ull subset=(1L<<k)-1L;
    lst=[]
    while(MAX>subset):
         lst.append(subset2list(subset, start, k))
         subset=next_subset(subset)
    return lst

This implementation uses some well-known bit-tricks and has the limitation, that it only works for at most 64 elements.

If we compare both approaches:

>>> %timeit x(1,45,6)
2.52 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit all_k_subsets(1,45,6)
1.29 s ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The speed-up of factor 2 is quite disappointing.

However, the bottle-neck is the creation of the lists and not the calculation itself - it is easy to check, that without list creation the calculation would take about 0.1 seconds.

My take away from it: if you are serious about speed you should not create so many lists but proceed the subset on the fly (best in cython) - a speed-up of more than 10 is possible. If it is a must to have all subsets as lists, so you cannot expect a huge speed-up.