I've used the 'njobs' parameter to get the multi-sample results, and it's far away from my expection
I've changed the '.theanorc' file to set the 'floatX', 'cnmem' value, etc.
I've monitored the GPU source by the command 'nvidia-smi', and it's well used
But, the sampling speed is already slow, even slower than the CPU.
Is that normal?
GPU is still experimental and we've seen speed-ups for some models and slow-downs for others. ADVI seems to be easier to run on the GPU, though. You can also check that all your model types and input data are float32.