Search code examples
pythonpython-2.7python-multiprocessingpython-multithreadingconcurrent.futures

How to deal with an IO and CPU bound at the same time in the context of parallelism?


Is it possible launch multiple threads on all available CPUs rather than one? An example code would be great.

Alternatively, can I span multiple processes and then create multi threading within each process?

I am using multithreading which works fine for IO side of my script. However, my script is also computation expensive so I would like to launch multiple threads on multiple CPUs.

My code flow:

def worker(url):
    extract url (io bound)  
    process url content (cpu bound)

What should be the efficient way to deal with such type of worker?


Solution

  • Is it possible launch multiple threads on all available CPUs rather than one?

    In general, threads run on whatever CPU is available. Unless you have specified a thread/process to run on a specific CPU. (How this is done varies per operating system)

    However, if you are using the Python implementation from python.org ("CPython"), it doesn't matter. CPython has a "Global Interpreter Lock" that enforces that only one thread at a time is executing Python bytecode. So using threads will not yield improved processing performance with CPython.

    So for computationally expensive tasks, you should probably use the multiprocessing module to do it in different processes. If the same work is done on a lot of data, using a multiprocessing.Pool is generally a good idea.