I am working on a variation of A3C/ACER and I have several workers, each running on its own thread. I am using OpenAI gym environments.
Python threading works fine but it cannot fully utilize all cores. As there are no blocking I/O, it does not context switch.
I would like workers to somehow to release the GIL while executing actions in their respective environments.
I would appreciate your feedback: Does it make sense and it is possible?
Answering my own question: I found that a quite efficient way is demonstrated in OpenAI universe-starter-agent: https://github.com/openai/universe-starter-agent.
The implementation uses Tensorflow and runs independent processes including a parameter server.
I think this can be useful as a reference to other people too.