python python-3.x tensorflow cublas cudnn

Tensorflow CUBLAS_STATUS_ALLOC_FAILED error

Tf version: 1.6.0 GPU  
Os: Windows 10 64bit  
CUDA: 9.0  
CUDNN: 7.0.5 for CUDA 9.0  
GPU: GeForce GTX 1070  
GPU version: 385.54  
RAM: 23.95GB  
CPU: Intel i7-3770k @3.50GHz  
Python version: 3.6.4

The code I'm working on worked last week, but not anymore. No changes have been made on the network related code.
No problems on import or initialization, but when TF starts training I run into some problems. Looks to happen when it does a minibatch and sets q_target.

Code that's get executed is:

q_target = self.target_net.y.eval(feed_dict={self.target_net.x: next_state})

target_net is a convolutional neural network.

target_net.y shape=(None, 18) dtype=float32  
target_net.x shape=(None, 720, 600, 4) dtype=float32

Smaller images and x shape does still create the error.

Error code I get is as follows:

2018-03-22 15:20:10.452238: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:443] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-03-22 15:20:10.452669: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:443] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-03-22 15:20:11.379190: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-03-22 15:20:11.379442: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-03-22 15:20:11.379676: F C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream- >parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)`

EDIT: The CPU version of TF works fine. No problems. So this is a GPU version only problem. And I would prefer using the GPU since highest level of efficiency is of importance.

Solution

Lowering the per_process_gpu_memory_fraction setting seems to work!

tf_config = tf.ConfigProto()
tf_config.gpu_options.per_process_gpu_memory_fraction = 0.99
with tf.Session(config=tf_config) as sess: