Search code examples
gpudeep-learningbaidupaddle-paddle

How to deploy a PaddlePaddle Docker container with GPU support?


There're slight discrepancies in the documentation Docker deployment for PaddlePaddle as compared to the documentation to manually install PaddlePaddle from source.

The documentation from the Docker deployment states after pulling the container from Docker Hub:

docker pull paddledev/paddle

the environment variables should be set and included in the docker run, i.e.:

export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest

The export commands seem to be looking for libcuda* and libnvidia* in /usr/lib64/ but in the documentation from source compilation, the location of lib64/ should be in /usr/local/cuda/lib64.

Regardless, the location of lib64/ can be found with:

cat /etc/ld.so.conf.d/cuda.conf

Additionally, the export command is looking for libnvidia* which doesn't seem exist anywhere in /usr/local/cuda/, except for libnvidia-ml.so:

/usr/local/cuda$ find . -name 'libnvidia*'
./lib64/stubs/libnvidia-ml.so

I suppose the correct files the CUDA_SO is looking for are

  • /usr/local/cuda/lib64/libcudart.so.8.0
  • /usr/local/cuda/lib64/libcudart.so.7.5

But is that right? What is the environmental variable(s) for CUDA_SO to deploy PaddlePaddle with GPU support?

Even after setting the libcudart* variable, the docker container doesn't seem to find the GPU driver, i.e.:

user0@server1:~/dockdock$ echo CUDA_SO="$(\ls $CUDA_CONFILE/libcuda* | xargs -I{} echo '-v {}:{}')"
CUDA_SO=-v /usr/local/cuda/lib64/libcudadevrt.a:/usr/local/cuda/lib64/libcudadevrt.a
-v /usr/local/cuda/lib64/libcudart.so:/usr/local/cuda/lib64/libcudart.so
-v /usr/local/cuda/lib64/libcudart.so.8.0:/usr/local/cuda/lib64/libcudart.so.8.0
-v /usr/local/cuda/lib64/libcudart.so.8.0.44:/usr/local/cuda/lib64/libcudart.so.8.0.44
-v /usr/local/cuda/lib64/libcudart_static.a:/usr/local/cuda/lib64/libcudart_static.a

user0@ server1:~/dockdock$ export CUDA_SO="$(\ls $CUDA_CONFILE/libcuda* | xargs -I{} echo '-v {}:{}')"

user0@ server1:~/dockdock$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')

user0@ server1:~/dockdock$ docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest

root@bd25dfd4f824:/# git clone https://github.com/baidu/Paddle paddle
Cloning into 'paddle'...
remote: Counting objects: 26626, done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 26626 (delta 3), reused 0 (delta 0), pack-reused 26603
Receiving objects: 100% (26626/26626), 25.41 MiB | 4.02 MiB/s, done.
Resolving deltas: 100% (18786/18786), done.
Checking connectivity... done.

root@bd25dfd4f824:/# cd paddle/demo/quick_start/

root@bd25dfd4f824:/paddle/demo/quick_start# sed -i 's|--use_gpu=false|--use_gpu=true|g' train.sh 

root@bd25dfd4f824:/paddle/demo/quick_start# bash train.sh 
I0410 09:25:37.300365    48 Util.cpp:155] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lr.py --save_dir=./output --trainer_count=4 --log_period=100 --num_passes=15 --use_gpu=true --show_parameter_stats_period=100 --test_all_data_in_one_period=1 
F0410 09:25:37.300940    48 hl_cuda_device.cc:526] Check failed: cudaSuccess == cudaStat (0 vs. 35) Cuda Error: CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
    @     0x7efc20557daa  (unknown)
    @     0x7efc20557ce4  (unknown)
    @     0x7efc205576e6  (unknown)
    @     0x7efc2055a687  (unknown)
    @           0x895560  hl_specify_devices_start()
    @           0x89576d  hl_start()
    @           0x80f402  paddle::initMain()
    @           0x52ac5b  main
    @     0x7efc1f763f45  (unknown)
    @           0x540c05  (unknown)
    @              (nil)  (unknown)
/usr/local/bin/paddle: line 109:    48 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

  [1]: http://www.paddlepaddle.org/doc/build/docker_install.html
  [2]: http://paddlepaddle.org/doc/build/build_from_source.html

How to deploy a PaddlePaddle Docker container with GPU support?


Also, in Chinese: https://github.com/PaddlePaddle/Paddle/issues/1764


Solution

  • Please refer to http://www.paddlepaddle.org/develop/doc/getstarted/build_and_install/docker_install_en.html

    The recommended way is using nvidia-docker.

    Please install nvidia-docker first following this tutorial.

    Now you can run a GPU image:

    docker pull paddlepaddle/paddle
    nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash