Tf version: 1.6.0 GPU
Os: Windows 10 64bit
CUDA: 9.0
CUDNN: 7.0.5 for CUDA 9.0
GPU: GeForce GTX 1070
GPU version: 385.54
RAM: 23.95GB
CPU: Intel i7-3770k @3.50GHz
Python version: 3.6.4
The code I'm working on worked last week, but not anymore. No changes have been made on the network related code.
No problems on import or initialization, but when TF starts training I run into some problems. Looks to happen when it does a minibatch and sets q_target.
Code that's get executed is:
q_target = self.target_net.y.eval(feed_dict={self.target_net.x: next_state})
target_net
is a convolutional neural network.
target_net.y shape=(None, 18) dtype=float32
target_net.x shape=(None, 720, 600, 4) dtype=float32
Smaller images and x shape does still create the error.
Error code I get is as follows:
2018-03-22 15:20:10.452238: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:443] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-03-22 15:20:10.452669: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:443] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-03-22 15:20:11.379190: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-03-22 15:20:11.379442: E C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-03-22 15:20:11.379676: F C:\tf_jenkins\workspace\rel-win\M\windows- gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream- >parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)`
EDIT: The CPU version of TF works fine. No problems. So this is a GPU version only problem. And I would prefer using the GPU since highest level of efficiency is of importance.
Lowering the per_process_gpu_memory_fraction setting seems to work!
tf_config = tf.ConfigProto()
tf_config.gpu_options.per_process_gpu_memory_fraction = 0.99
with tf.Session(config=tf_config) as sess: