I'm writing a server with Twisted and pyCUDA. A restriction of how CUDA works is that I must access the CUDA context in the same thread I initialized it. However, the threadpool implementation of Twisted doesn't allow me to request a specific thread.
For example, if I have multiple clients connected to the server, they will request some computation done with CUDA. Multiple operations will be requested with the same CUDA object (initialization is expensive). I wanted to use the deferToThread function, but this does not allow me to defer to a specific thread, just 'some' thread. What I would like to do, ideally, is to use a mechanism like deferToThread but to specify the thread that the code runs on. Any suggestions would be appreciated, maybe Twisted is the wrong way to go for this project.
The CUDA Driver API has supported submitting working to a CUcontext (Driver API) from multiple thread through use of the functions cuCtxPushCurrent() and cuCtxPopCurrent(). Current for many releases. In CUDA 4.0 and beyond the CUDA Runtime supports submitting work to a device (CUcontext) from multiple OS threads or submitting work to multiple devices from a single OS thread using the function cudaSetDevice().
I'm not sure if this is exposed through pyCUDA.