Cupy get error in multithread.pool if GPU already used

I tried to use cupy in two parts of my program, one of them being parallelized with a pool. I managed to reproduce it with a simple example:

import cupy
import numpy as np
from multiprocessing import pool


def f(x):
    return cupy.asnumpy(2*cupy.array(x))



input = np.array([1,2,3,4])
print(cupy.asnumpy(cupy.array(input)))


print(np.array(list(map(f, input))))

p = pool.Pool(4)
output = p.map(f, input)
p.close()
p.join()
print(output)

The output is the following:

[1 2 3 4]
[2 4 6 8]
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 489, in _handle_results
    task = get()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "cupy/cuda/runtime.pyx", line 126, in cupy.cuda.runtime.CUDARuntimeError.__init__
TypeError: an integer is required

also, the code freezes and doesn't exit but I think it's not related to cupy.

And my config is this one:

CuPy Version          : 5.2.0
CUDA Root             : /usr/local/cuda-10.0
CUDA Build Version    : 10000
CUDA Driver Version   : 10000
CUDA Runtime Version  : 10000
cuDNN Build Version   : 7301
cuDNN Version         : 7301
NCCL Build Version    : 2307

Solution

This issue is not specific to CuPy. Due to the limitation of CUDA, processes cannot be forked after CUDA initialization.

You need to use multiprocessing.set_start_method('spawn') (or forkserver), or avoid initializing CUDA (i.e., do not use CuPy API except import cupy) until you fork child processes.