python multithreading optimization multiprocessing nested-loops

What is the easiest way to make maximum cpu usage for nested for-loops?

I have code that makes unique combinations of elements. There are 6 types, and there are about 100 of each. So there are 100^6 combinations. Each combination has to be calculated, checked for relevance and then either be discarded or saved.

The relevant bit of the code looks like this:

def modconffactory():
for transmitter in totaltransmitterdict.values():
    for reciever in totalrecieverdict.values():
        for processor in totalprocessordict.values():
            for holoarray in totalholoarraydict.values():
                for databus in totaldatabusdict.values():
                    for multiplexer in totalmultiplexerdict.values():
                        newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
                        data_I_need = dosomethingwith(newconfiguration)
                        saveforlateruse_if_useful(data_I_need)

Now this takes a long time and that is fine, but now I realize this process (making the configurations and then calculations for later use) is only using 1 of my 8 processor cores at a time.

I've been reading up about multithreading and multiprocessing, but I only see examples of different processes, not how to multithread one process. In my code I call two functions: 'dosomethingwith()' and 'saveforlateruse_if_useful()'. I could make those into separate processes and have those run concurrently to the for-loops, right?

But what about the for-loops themselves? Can I speed up that one process? Because that is where the time consumption is. (<-- This is my main question)

Is there a cheat? for instance compiling to C and then the os multithreads automatically?

Solution

I only see examples of different processes, not how to multithread one process

There is multithreading in Python, but it is very ineffective because of GIL (Global Interpreter Lock). So if you want to use all of your processor cores, if you want concurrency, you have no other choice than use multiple processes, which can be done with multiprocessing module (well, you also could use another language without such problems)

Approximate example of multiprocessing usage for your case:

import multiprocessing

WORKERS_NUMBER = 8

def modconffactoryProcess(generator, step, offset, conn):
    """
    Function to be invoked by every worker process.

    generator: iterable object, the very top one of all you are iterating over, 
    in your case, totalrecieverdict.values()

    We are passing a whole iterable object to every worker, they all will iterate 
    over it. To ensure they will not waste time by doing the same things 
    concurrently, we will assume this: each worker will process only each stepTH 
    item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER, 
    and offset must be a unique number for each worker, varying from 0 to 
    WORKERS_NUMBER - 1

    conn: a multiprocessing.Connection object, allowing the worker to communicate 
    with the main process
    """
    for i, transmitter in enumerate(generator):
        if i % step == offset:
            for reciever in totalrecieverdict.values():
                for processor in totalprocessordict.values():
                    for holoarray in totalholoarraydict.values():
                        for databus in totaldatabusdict.values():
                            for multiplexer in totalmultiplexerdict.values():
                                newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
                                data_I_need = dosomethingwith(newconfiguration)
                                saveforlateruse_if_useful(data_I_need)
    conn.send('done')


def modconffactory():
    """
    Function to launch all the worker processes and wait until they all complete 
    their tasks
    """
    processes = []
    generator = totaltransmitterdict.values()
    for i in range(WORKERS_NUMBER):
        conn, childConn = multiprocessing.Pipe()
        process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn))
        process.start()
        processes.append((process, conn))
    # Here we have created, started and saved to a list all the worker processes
    working = True
    finishedProcessesNumber = 0
    try:
        while working:
            for process, conn in processes:
                if conn.poll():  # Check if any messages have arrived from a worker
                    message = conn.recv()
                    if message == 'done':
                        finishedProcessesNumber += 1
            if finishedProcessesNumber == WORKERS_NUMBER:
                working = False
    except KeyboardInterrupt:
        print('Aborted')

You can adjust WORKERS_NUMBER to your needs.

Same with multiprocessing.Pool:

import multiprocessing

WORKERS_NUMBER = 8

def modconffactoryProcess(transmitter):
    for reciever in totalrecieverdict.values():
        for processor in totalprocessordict.values():
            for holoarray in totalholoarraydict.values():
                for databus in totaldatabusdict.values():
                    for multiplexer in totalmultiplexerdict.values():
                        newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
                        data_I_need = dosomethingwith(newconfiguration)
                        saveforlateruse_if_useful(data_I_need)


def modconffactory():
    pool = multiprocessing.Pool(WORKERS_NUMBER)
    pool.map(modconffactoryProcess, totaltransmitterdict.values())

You probably would like to use .map_async instead of .map

Both snippets do the same, but I would say in the first one you have more control over the program.

I suppose the second one is the easiest, though :)

But the first one should give you the idea of what is happening in the second one

multiprocessing docs: https://docs.python.org/3/library/multiprocessing.html