Search code examples
pythontensorflowobject-detection-api

Tensorflow 2 Object Detection API: STATUS_ALLOC_FAILED for CUBLAS and CUDNN


For utilizing Tensorflow's Object Detection transfer learning capabilities, I followed the "Training Custom Object Detector" tutorial (https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html)

When running the script to continue training a pre-trained model:

python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config

(found here: https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py)

It would give the error failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED and Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED.

Appendix: Full Error Message.

2021-03-31 11:23:50.254191: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-03-31 11:24:00.733341: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-31 11:24:55.680591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-03-31 11:24:55.741417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P3200 computeCapability: 6.1
coreClock: 1.543GHz coreCount: 14 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 156.69GiB/s
2021-03-31 11:24:55.753665: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-03-31 11:24:56.258845: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-31 11:24:56.264945: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-03-31 11:24:56.342082: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-03-31 11:24:56.388945: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-03-31 11:24:56.726179: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-03-31 11:24:58.861159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-03-31 11:24:58.882019: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-03-31 11:24:58.888242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-03-31 11:24:58.894981: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-31 11:24:58.917736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P3200 computeCapability: 6.1
coreClock: 1.543GHz coreCount: 14 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 156.69GiB/s
2021-03-31 11:24:58.931960: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-03-31 11:24:58.938383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-31 11:24:58.946647: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-03-31 11:24:58.954216: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-03-31 11:24:58.962576: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-03-31 11:24:58.968354: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-03-31 11:24:58.976832: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-03-31 11:24:58.984725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-03-31 11:24:58.992582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-03-31 11:25:02.220596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-31 11:25:02.229170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-03-31 11:25:02.234800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-03-31 11:25:02.239279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2046 MB memory) -> physical GPU (device: 0, name: Quadro P3200, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-03-31 11:25:02.259351: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0331 11:25:02.274096 31668 mirrored_strategy.py:350] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0331 11:25:02.285093 31668 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0331 11:25:02.287103 31668 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py:530: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0331 11:25:02.854127 31668 deprecation.py:333] From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py:530: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['annotations/train.record']
I0331 11:25:02.866126 31668 dataset_builder.py:163] Reading unweighted datasets: ['annotations/train.record']
INFO:tensorflow:Reading record datasets for input file: ['annotations/train.record']
I0331 11:25:02.868127 31668 dataset_builder.py:80] Reading record datasets for input file: ['annotations/train.record']
INFO:tensorflow:Number of filenames to read: 1
I0331 11:25:02.869126 31668 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0331 11:25:02.869126 31668 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\builders\dataset_builder.py:101: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0331 11:25:02.872127 31668 deprecation.py:333] From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\builders\dataset_builder.py:101: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\builders\dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0331 11:25:02.892127 31668 deprecation.py:333] From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\builders\dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0331 11:25:09.261741 31668 deprecation.py:333] From C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\util\dispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0331 11:25:11.998297 31668 deprecation.py:333] From C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\util\dispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0331 11:25:13.740744 31668 deprecation.py:333] From C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2021-03-31 11:25:16.871176: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\keras\backend.py:434: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
2021-03-31 11:25:55.647417: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-31 11:36:31.531924: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.540166: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.546544: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.554175: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.562664: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.569155: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.576934: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.584912: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.592756: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.599329: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.607199: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.615221: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.623010: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.628693: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.635493: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.642145: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.647750: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.654350: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.659746: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.666478: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.673162: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.679507: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.686889: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.694029: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.699441: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.706465: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.713689: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.718661: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.725077: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.730345: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.736757: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.744157: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-03-31 11:36:31.783797: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-03-31 11:36:33.966712: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2021-03-31 11:36:33.973962: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
Traceback (most recent call last):
  File "model_main_tf2.py", line 113, in <module>
    tf.compat.v1.app.run()
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\platform\app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run
    _run_main(main, args)
  File "C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "model_main_tf2.py", line 104, in main
    model_lib_v2.train_loop(
  File "C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 569, in train_loop
    load_fine_tune_checkpoint(detection_model,
  File "C:\Users\212765830\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 369, in load_fine_tune_checkpoint
    strategy.run(
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1259, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2730, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 628, in _call_for_each_replica
    return mirrored_run.call_for_each_replica(
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\distribute\mirrored_run.py", line 75, in call_for_each_replica
    return wrapped(args, kwargs)
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
    return graph_function._call_flat(
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\function.py", line 555, in call
    outputs = execute.execute(
  File "C:\Users\212765830\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node ResNet50V1_FPN/model/conv1_conv/Conv2D (defined at \Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\models\ssd_resnet_v1_fpn_keras_feature_extractor.py:224) ]] [Op:__inference__dummy_computation_fn_19161]

Errors may have originated from an input operation.
Input Source operations connected to node ResNet50V1_FPN/model/conv1_conv/Conv2D:
 ResNet50V1_FPN/model/lambda/Pad (defined at \Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\object_detection\models\keras_models\resnet_v1.py:49)

Function call stack:
_dummy_computation_fn

Solution

  • Can you try and see if this works? Add these lines of code after line 74 in the model_main_tf2.py