I have the following code to generate all possible combinations in specified range using itertools but I cant get any speed improvements from using the code with cython. Original code is this:
from itertools import *
def x(e,f,g):
a=[]
for c in combinations(range(e, f),g):
d = list((c))
a.append(d)
and after declaring types for cython:
from itertools import *
cpdef x(int e,int f,int g):
cpdef tuple c
cpdef list a
cpdef list d
a=[]
for c in combinations(range(e, f),g):
d = list((c))
a.append(d)
I saved the latter as test_cy.pyx
and compiled using cythonize -a -i test_cy.pyx
After compiling, I created a new script with the following code and ran it:
import test_cy
test_cy.x(1,45,6)
I didnt get any significant speed improvement, still took about the same time as the original script, about 10.8 sec.
Is there anything I did wrong or is itertools already so optimised that there cant be any bigger improvements to its speed?
As already pointed out in the comments, you should not expect cython to speed-up your code because the most of the time the algorithm spends in itertools and creation of lists.
Because I'm curios to see how itertools
's generic implementation fares against old-school-tricks, let's take a look at this Cython implementation of "all subsets k out of n":
%%cython
ctypedef unsigned long long ull
cdef ull next_subset(ull subset):
cdef ull smallest, ripple, ones
smallest = subset& -subset
ripple = subset + smallest
ones = subset ^ ripple
ones = (ones >> 2)//smallest
subset= ripple | ones
return subset
cdef subset2list(ull subset, int offset, int cnt):
cdef list lst=[0]*cnt #pre-allocate
cdef int current=0;
cdef int index=0
while subset>0:
if((subset&1)!=0):
lst[index]=offset+current
index+=1
subset>>=1
current+=1
return lst
def all_k_subsets(int start, int end, int k):
cdef int n=end-start
cdef ull MAX=1L<<n;
cdef ull subset=(1L<<k)-1L;
lst=[]
while(MAX>subset):
lst.append(subset2list(subset, start, k))
subset=next_subset(subset)
return lst
This implementation uses some well-known bit-tricks and has the limitation, that it only works for at most 64 elements.
If we compare both approaches:
>>> %timeit x(1,45,6)
2.52 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit all_k_subsets(1,45,6)
1.29 s ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The speed-up of factor 2 is quite disappointing.
However, the bottle-neck is the creation of the lists and not the calculation itself - it is easy to check, that without list creation the calculation would take about 0.1 seconds.
My take away from it: if you are serious about speed you should not create so many lists but proceed the subset on the fly (best in cython) - a speed-up of more than 10 is possible. If it is a must to have all subsets as lists, so you cannot expect a huge speed-up.