I'm having some misinformation problem regarding Tensorflow. Lot's of info on lot's of places, and never complete enough.
I got my system set up with CUDA 8.0, cuDNN and I have Keras + Theano working ok with python 2.7. I'm trying to move to Tensorflow.
As I had compatibility problems with numpy and other stuff when I tried to install it in the same environment, I installed miniconda2, created a virtual env for it conda create -n tensorflow pip
and activated it, as instructed here: https://www.tensorflow.org/install/install_linux#InstallingAnaconda
The environment seems operational.
Afterwards, I installed tensorflow from https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp27-none-linux_x86_64.whl and also Keras, only to noticed I had some modules duplicated on conda list
, some marked with a version string, others marked with <pip>
only. Specially, I got one Tensorflow-gpu 1.2.1 and Tensorflow 1.1.0. Both of them. The old version just comes by with Keras.
Also, there's a myriad of warnings about Tensorflow not being compiled to use certain CPU instruction sets, and there's this answer How to compile Tensorflow with SSE4.2 and AVX instructions? about compiling it with using basel
, but I don't really find any information about where to put the source code and what files to move to where after running that bazel command line.
To make matters worse, whenever I run a simple 20x20 matrix multiplication code with "/gpu:0"
as device, the code list that horrendous warnings, correctly detects the presence of a GTX 1070, but never really confirms it was used to to the calculations. And it runs faster on "/cpu:0"
. How I miss Theano...
Could someone point me out where can I find:
I'm using Linux Mint 18.
I have used conda and have installed Tensorflow=1.1.0, but it never seemed to have worked correctly within python. I also came across in github issues that anconda are currently working on the Tensorflow GPU version and so no matter what I tried in Anaconda, it never used my Tesla NVIDIA P100-SXM2-16GB card and it used only the CPU.
I suggest you use the normal environment till they get Tensorflow-gpu to work right in Anaconda.
To check if the tensorflow-gpu works I used the Inception v3 model with TF0.12 / TF1.0.
This is the process that I go through to install tensorflow1.0:
Step 0.
sudo -i
apt-get install aptitude
aptitude install software-properties-common
apt-get install libcupti-dev pip
apt-get update
apt-get upgrade libc6
Step 1. Install Nvidia Components. I think you already have that installed
Download the NVIDIA cuDNN 5.1 for CUDA 8.0 from https://developer.nvidia.com/rdp/cudnn-download (Registration in NVIDIA's Accelerated Computing Developer Program is required)
Cudnn 5.1 works well with most of the architectures and OS out there
Step 2. Install bazel and tensorflow
apt-get install bazel
you can go to this link https://pypi.python.org/pypi/tensorflow-gpu/1.1.0rc0 and do a
pip install <python-wheel-version>
If you have python2.7 and python 3.* installed, then use pip2 to install for python2.7
Step 3. Install openjdk
apt-get install openjdk-8-jdk
Step 4. git clone the Inception model code
git clone https://github.com/tensorflow/models.git
cd models
git checkout master
cd inception
This is where bazel comes in the picture. See Bazel's Getting Started docs for a more detailed explanation of what a target is. So, if you do a
ls -lstr
you might see 5 bazel related symbolic links
bazel-bin bazel-genfiles bazel-inception bazel-out bazel-testlogs
these are the target directory to which you build your specific model
Assuming you're in the models/inception directory
bazel build inception/imagenet_train
This activates the symbolic link
NOTE: For this imagenet_train.py to work you need to prepare the imagenet dataset. You either skip this part or go through this:
STEP 5. Prepare the Imagenet dataset Before you run the training script for the first time, you will need to download and convert the ImageNet data to native TFRecord format. To begin, you will need to sign up for an account with ImageNet to gain access to the data. Look for the sign-up page, create an account and request an access key to download the data.
After you have USERNAME and PASSWORD, you are ready to run our script. Make sure that your hard disk has at least 500 GB of free space for downloading and storing the data. Here we select DATA_DIR=$HOME/imagenet-data as such a location but feel free to edit accordingly.
When you run the below script, please enter USERNAME and PASSWORD when prompted. This will occur at the very beginning. Once these values are entered, you will not need to interact with the script again.
#location of where to place the ImageNet data
DATA_DIR=$HOME/imagenet-data
Here $HOME is /root
# build the preprocessing script.
bazel build inception/download_and_preprocess_imagenet
# run it
bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}"
# Place the tensor records at /root/dataset
Step 6. Source bazel and tensorflow This step is very important. This will activate the python packages and I think you maybe getting errors because the python package for tensorflow is not activated. If you have skipped step 5 then you might want to go to
/models/inception/sample
and run the gpu.py script
python gpu.py
This should verify that your tensorflow version works with your gpu
source /opt/DL/bazel/bin/bazel-activate
source /opt/DL/tensorflow/bin/tensorflow-activate
You also check by importing tensorflow into python eg: import tensorflow as tf
find a hello world eg on their site and if this gives errors then it has not been installed properly
Step 7. Run the imagenet training --You can skip this step if you have skipped step 5.
bazel-bin/inception/imagenet_train --num_gpus=1 --batch_size=256 --train_dir=/tmp --data_dir=/root/dataset/ --max_steps=100