Search code examples
python-3.xpython-multiprocessing

Why does implementing multiprocessing makes my program slower


I'm trying to implement multiprocessing in my code to make it faster.

To make it easier to understand I will just say the program fits an observed curve using a linear combination of a library of curves and from that measures properties of the observed curve.

I have to do this for over 400 curves and in order to estimate the errors of these properties I perform a Monte Carlo simulation, which means I have to iterate a number of times each calculation.

This takes a lot of time and work, and granted I believe it is a CPU-bound task I figured I'd use multiprocessing in the error estimation step. Here's a simplification of my code:

Without multiprocessing

import numpy as np
import fitting_package
import multiprocessing


def estimate_errors(best_fit_curve, signal_to_noise, fit_kwargs, iterations=100)
    results = defaultdict(list)
    def fit(best_fit_curve, signal_to_noise, fit_kwargs, results):
        # Here noise is added to simulate a new curve (Monte Carlo simulation)
        noise = best_fit/signal_to_noise
        simulated_curve = np.random.normal(best_fit_curve, noise)
        # The arguments from the original fit (outside the error estimation) are passed to the fitting
        fit_kwargs.update({'curve' : simulated_curve})
        # The fit is performed and it returns the properties packed together
        solutions = fitting_package(**fit_kwargs)
        # There are more properties so this is a simplification
        property_1, property_2 = solutions
        aux_dict = {'property_1' : property_1, 'property_2' : property_2}
        for key, value in aux_dict.items():
            results[key].append(values)
    for _ in range(iterations):
        fit(best_fit_curve, signal_to_noise, fit_kwargs, results)
    return results

With multiprocessing

def estimate_errors(best_fit_curve, signal_to_noise, fit_kwargs, iterations=100)
    def fit(best_fit_curve, signal_to_noise, fit_kwargs, queue):
        results = queue.get()
        noise = best_fit/signal_to_noise
        simulated_curve = np.random.normal(best_fit_curve, noise)
        fit_kwargs.update({'curve' : simulated_curve})
        solutions = fitting_package(**fit_kwargs)
        property_1, property_2 = solutions
        aux_dict = {'property_1' : property_1, 'property_2' : property_2}
        for key, value in aux_dict.items():
            results[key].append(values)
        queue.put(results)
    process_list = []
    queue = multiprocessing.Queue()
    queue.put(defaultdict(list))
    for _ in range(iterations):
        process = multiprocessing.Process(target=fit, args=(best_fit_curve, signal_to_noise, fit_kwargs, queue))
        process.start()
        process_list.append(process)
    for p in process_list:
        p.join()
    results = queue.get()
    return results

I thought using multiprocessing would save time, but it actually takes more than double than the other way to do it. Why is this? Is there anyway I can make it faster with multiprocessing?


Solution

  • I thought using multiprocessing would save time, but it actually takes more than double than the other way to do it. Why is this?

    Starting a process takes a long time (at least in computer terms). It also uses a lot of memory.

    In your code, you are starting 100 separate Python interpreters in 100 separate OS processes. That takes a really long time, so unless each process runs a very long time, the time it takes to start the process is going to dominate the time it actually does useful work.

    In addition to that, unless you actually have 100 un-used CPU cores, those 100 processes will just spend most of their time waiting for each other to finish. Even worse, since they all have the same priority, the OS will try to give each of them a fair amount of time, so it will run them for a bit of time, then suspend them, run others for a bit of time, suspend them, etc. All this scheduling also takes time.

    Having more parallel workloads than parallel resources cannot speed up your program, since they have to wait to be executed one-after-another anyway.

    Parallelism will only speed up your program if the time for the parallel tasks is not dominated by the time of creating, managing, scheduling, and re-joining the parallel tasks.