Search code examples
pythonmultiprocessingdeadlockxgboost

Why is training this xgboost model in a subprocess not terminating?


Given the following program running my_function in a subprocess using run_process_timeout_wrapper leads to a timeout (over 160s), while running it "normally" takes less than a second.

from multiprocessing import Process, Queue
import time
import numpy as np
import xgboost


def run_process_timeout_wrapper(function, args, timeout):

    def foo(n, out_q):
        res = function(*n)
        out_q.put(res)  # to get result back from thread target

    result_q = Queue()
    p = Process(target=foo, args=(args, result_q))
    p.start()

    try:
        x = result_q.get(timeout=timeout)
    except Empty as e:
        p.terminate()
        raise multiprocessing.TimeoutError("Timed out after waiting for {}s".format(timeout))

    p.terminate()
    return x


def my_function(fun):
    print("Started")
    t1 = time.time()
    pol = xgboost.XGBRegressor()
    pol.fit(np.random.rand(5,1500), np.random.rand(50,1))
    print("Took ", time.time() - t1)
    pol.predict(np.random.rand(2,1500))

    return 5


if __name__ == '__main__':

    t1 = time.time()
    pol = xgboost.XGBRegressor()
    pol.fit(np.random.rand(50,150000), np.random.rand(50,1))
    print("Took ", time.time() - t1)

    my_function(None)


    t1 = time.time()
    res = run_process_timeout_wrapper(my_function, (None,),160)
    
    print("Res ",  res, " Time ", time.time() - t1)

I am running this on Linux. Since it has come up, I have also added a print in the beginning of my_function showing that this function is at least reached.


Solution

  • Gathered from this issue I found that forking a multi-threaded application is problematic. One possible solution is to add

    if __name__ == "__main__":
         mp.set_start_method('spawn')
    

    However, this may lead to other issues.