performance tensorflow eigen bazel eigen3

Tensorflow build from source not faster for retraining?

I've been running Tensorflow on my lovely MBP early 2015, CPU only. I decided to build a Tensorflow version with Bazel to speed things up with: SSE4.1, SSE4.2, AVX, AVX2 and FMA.

bazel build --copt=-march=native //tensorflow/tools/pip_package:build_pip_package

But retraining the Inception v3 model with the new install isn't faster, it uses exactly the same amount of time. It is strange, because while doing inference with a trained inception model I get a 12% speed increase. Training the MNIST example is 30% faster.

So is it possible that we don't get any speed benefits doing retraining?

I also did a Bazel build for a retainer like explained here, same result.

My ./configure:

Please specify the location of python. [Default is /Users/Gert/Envs/t4/bin/python]: Users/Gert/Envs/t4/bin/python3
Invalid python path. Users/Gert/Envs/t4/bin/python3 cannot be found
Please specify the location of python. [Default is /Users/Gert/Envs/t4/bin/python]: ls
Invalid python path. ls cannot be found
Please specify the location of python. [Default is /Users/Gert/Envs/t4/bin/python]: lslss
Invalid python path. lslss cannot be found
Please specify the location of python. [Default is /Users/Gert/Envs/t4/bin/python]: /rt/Envs/t4/bin/python3^C
(t4) Gerts-MacBook-Pro:tensorflow root#
(t4) Gerts-MacBook-Pro:tensorflow root# ./configure
Please specify the location of python. [Default is /Users/Gert/Envs/t4/bin/python]: /Users/Gert/Envs/t4/bin/python3
Please specify optimization flags to use during compilation [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? (Linux only) [Y/n] n
jemalloc disabled on Linux
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] n
No XLA JIT support will be enabled for TensorFlow
Found possible Python library paths:
  /Users/Gert/Envs/t4/lib/python3.4/site-packages
Please input the desired Python library path to use.  Default is [/Users/Gert/Envs/t4/lib/python3.4/site-packages]

Using python library path: /Users/Gert/Envs/t4/lib/python3.4/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] n
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] n
No CUDA support will be enabled for TensorFlow
Configuration finished

Thanks,

Gert

Solution

The MNIST example spends most of its time inside the matrix product.

On the other hand, typical CNNs spend most of their time inside the convolutions.

TF uses Eigen for its matrix products on the CPU, which is quite optimized, as I understand, and the reason why you see a noticeable speed-up.

Convolutions on the CPU are not as optimized, if my info is current. They waste their time copying data, so it can be processed by matrix multiplication. So, there is less of an impact when the latter is sped up.