Search code examples
pythonmultithreadingpython-itertools

How do I "multi-process" the itertools product module?


So I tried I tried calculating millions and millions of different combinations of the below string but I was only calculating roughly 1,750 combinations a second which isn't even near the speed I need. So how would I reshape this so multiple processes of the same thing are calculating different parts, while not calculating parts that have already been calculated and also maintaining fast speeds? The code below is partially what I've been using. Any examples would be appreciated!

from itertools import product
for chars in product("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!@#$%^&*?,()-=+[]/;", repeat = 4):
   print chars

Solution

  • One way to break the product up into parts is to break up the first component of the product, so that each independent job has all the elements starting with a certain set of first letters. For example:

    import string
    import multiprocessing as mp
    import itertools
    
    alphabet = string.ascii_letters+string.digits+"!@#$%^&*?,()-=+[]/;"
    num_parts = 4
    part_size = len(alphabet) // num_parts
    
    def do_job(first_bits):
        for x in itertools.product(first_bits, alphabet, alphabet, alphabet):
            print(x)
    
    if __name__ == "__main__":
        pool = mp.Pool()
        results = []
        for i in xrange(num_parts):
            if i == num_parts - 1:
                first_bit = alphabet[part_size * i :]
            else:
                first_bit = alphabet[part_size * i : part_size * (i+1)]
            results.append(pool.apply_async(do_job(first_bit)))
    
        pool.close()
        pool.join()
    

    (where obviously you'd only use results if do_job actually returned something).