Search code examples
pythonproductpython-itertools

python itertools product slow is the write speed to a output file a bottleneck


I have a simple python function performing itertools product function. As seen below.

def cart(n, seq):
    import itertools
    b = 8
    while b < n:
        n = n - 1
        for p in itertools.product(seq, repeat=n):
            file.write(''.join(p))
            file.write('\n')

The function works but it is extremely slow. It is not even using a noticeable amount of resources. I was wondering if the bottle neck was the disk write speed? currently the the script is averaging at 2.5 mb per second. I also attempted this to a solid state drive and recieved the same speeds which leads me to believe the write speed is not the bottle neck. Is there a way to speed this function up and use more system resources? or is itertools just slow? Forgive me I am new to python.


Solution

  • You can profile your code to get an idea of the location of the bottleneck. The following will create a file called "cart_stats.txt" with the profiling information in it. Running it myself seems to indicate that most of the time is being spent calling file.write().

    from cProfile import Profile
    from pstats import Stats
    prof = Profile()
    prof.disable()
    
    file = open('cart_output.txt', 'wt')
    
    def cart(n, seq):
        import itertools
        b = 8
        while b < n:
            n = n - 1
            for p in itertools.product(seq, repeat=n):
                file.write(''.join(p))
                file.write('\n')
    
    prof.enable()
    cart(10, 'abc')
    prof.disable()
    
    prof.dump_stats('cart.stats')
    with open('cart_stats.txt', 'wt') as output:
        stats = Stats('cart.stats', stream=output)
        stats.sort_stats('cumulative', 'time')
        stats.print_stats()
    
    file.close()
    print 'done'
    

    FWIW, the slowness seems to be overwhelmingly due the calls to file.write() itself, because it's still there even if I open() the output stream with a huge buffer or make it a StringIO instance. I was able to significantly reduce that by optimizing and minimizing the calls to it as shown below:

    def cart(n, seq):
        import itertools
        b = 8
        write = file.write  # speed up lookup of method
        while b < n:
            n = n - 1
            for p in itertools.product(seq, repeat=n):
                write(''.join(p)+'\n')  # only call it once in loop
    

    Which proves that having a profiler in place can be the best way to know where to spend your time and get the most benefit.

    Update:

    Here's a version that stores all output generated in memory before making a single file.write() call. It is significantly faster than using StringIO.StringIO because it's less general, but however is still not quite as fast as using a cStringIO.StringIO instance.

    file = open('cart_output.txt', 'wt')
    
    def cart(n, seq):
        from itertools import product
        buflist = []
        append = buflist.append
        b = 8
        while b < n:
            n = n - 1
            for p in product(seq, repeat=n):
                append(''.join(p))
        file.write('\n'.join(buflist)+'\n')
    
    file.close()