Search code examples
python-3.xtheanotheano-cudagpuarray

Problems with Theano installation using CUDA when using non-root user


I have followed the instructions to install Theano an GPUArray from source (git versions), in the system folders (not as a user). The GPUArray tests run just fine without errors.

The problem is Theano only works with GPU if I run as root. Running the example to test gpu:

(python35) rll@ip-30-92:~$ THEANO_FLAGS=device=cuda python temp.py 
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/__init__.py", line 179, in <module>
    use(config.device)
  File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/__init__.py", line 166, in use
    init_dev(device, preallocate=preallocate)
  File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/__init__.py", line 73, in init_dev
    context.cudnn_handle = dnn._make_handle(context)
  File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/dnn.py", line 83, in _make_handle
    cudnn = _dnn_lib()
  File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/dnn.py", line 70, in _dnn_lib
    raise RuntimeError('Could not find cudnn library (looked for v5* or v6*)')
RuntimeError: Could not find cudnn library (looked for v5* or v6*)
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 3.201078 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the cpu

If run as root it works, although there is still an error related to cuDNN not being able to identify the devices maybe:

(python35) rll@ip-30-92:~$ sudo THEANO_FLAGS=device=cuda python3 temp.py 
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
b'/tmp/try_flags_bg7m03hd.c:4:19: fatal error: cudnn.h: No such file or directory\ncompilation terminated.\n'
Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float64, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.390976 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the gpu

There are 2 Titan X on this machine. Works fine with Tensorflow. I am not using .theanorc file, but I have set both:

(python35) rll@ip-30-92:~$ echo $LD_LIBRARY_PATH 
/usr/local/cuda-8.0/lib64
(python35) rll@ip-30-92:~$ echo $CUDA_ROOT
/usr/local/cuda-8.0/

I did everything as per the instructions, and despite some warnings there were no errors.

I don't think it is a permissions error on the compile dir .theano, because if I chown the .theano dir the behaviour is the same.

How can I fix this?


Solution

  • I have finally found the problem. There is an aspect missing in the instructions to install Theano which is that you have to verify if LIBRARY_PATH is set and add the cuda libraries to it (note that it is not the LD_LIBRARY_PATH).

    If it is not set just export it and you will be good to go. So for temporary fix:

     export LIBRARY_PATH=/usr/local/cuda-8.0/lib64
    

    To persist it may depend on the system, but in general you can add to the /etc/environment, adding a line:

    LIBRARY_PATH=/usr/local/cuda-8.0/lib64
    

    This fixed the message when root, and fixed cuda for the regular user.