Search code examples
python-3.xtensorflowtensorflow2.0object-detection-api

Object Detection API model_main_tf2.py : Dst tensor is not initialized


I'm trying to use tensorflow with GPU, but i can't stop to have problems. I'm actually giving up...

I'm using the object detection API with tensorflow 2.2.0. So i'm trying to execute the file model_main_tf2.py by doing :

python model_main_tf2.py --model_dir=/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/models/faster_rcnn_inception_resnet_v2 --pipeline_config_path=/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/models/faster_rcnn_inception_resnet_v2/pipeline.config

I have the following outputs :

2021-03-18 20:48:33.947464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-18 20:48:33.984880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:8e:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:33.988155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:9c:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:33.988792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:33.991147: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-18 20:48:33.993016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-18 20:48:33.993360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-18 20:48:33.995848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-18 20:48:33.997723: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-18 20:48:34.003189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-18 20:48:34.017701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2021-03-18 20:48:34.018129: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-03-18 20:48:34.042797: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3000000000 Hz
2021-03-18 20:48:34.060539: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe080000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-18 20:48:34.060586: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-03-18 20:48:34.498255: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6cb0aa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-18 20:48:34.498292: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
2021-03-18 20:48:34.498300: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla V100-PCIE-16GB, Compute Capability 7.0
2021-03-18 20:48:34.499612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:8e:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:34.500303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:9c:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:34.500390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:34.500408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-18 20:48:34.500424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-18 20:48:34.500438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-18 20:48:34.500453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-18 20:48:34.500467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-18 20:48:34.500482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-18 20:48:34.510455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2021-03-18 20:48:34.510513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:34.515846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-18 20:48:34.515864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 1
2021-03-18 20:48:34.515876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N Y
2021-03-18 20:48:34.515883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1:   Y N
2021-03-18 20:48:34.520362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3595 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:8e:00.0, compute capability: 7.0)
2021-03-18 20:48:34.521752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 52 MB memory) -> physical GPU (device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:9c:00.0, compute capability: 7.0)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
I0318 20:48:34.543391 140628099540800 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
INFO:tensorflow:Maybe overwriting train_steps: None
I0318 20:48:34.547359 140628099540800 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0318 20:48:34.547507 140628099540800 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Reading unweighted datasets: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
I0318 20:48:36.083467 140628099540800 dataset_builder.py:163] Reading unweighted datasets: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
INFO:tensorflow:Reading record datasets for input file: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
I0318 20:48:36.085170 140628099540800 dataset_builder.py:80] Reading record datasets for input file: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
INFO:tensorflow:Number of filenames to read: 1
I0318 20:48:36.085289 140628099540800 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0318 20:48:36.085340 140628099540800 dataset_builder.py:88] num_readers has been reduced to 1 to match inputfile shards.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0318 20:48:36.091829 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0318 20:48:36.120102 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:96: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0318 20:48:48.361122 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:96: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated andwill be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0318 20:48:56.421583 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will beremoved in a future version.
Instructions for updating:
Use `tf.cast` instead.
2021-03-18 20:49:12.346383: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_1_bfc) ran out of memory trying to allocate 162.68MiB (rounded to 170581504)
Current allocation summary follows.
2021-03-18 20:49:12.346462: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_1_bfc
2021-03-18 20:49:12.346493: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (256):   Total Chunks:3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin.
2021-03-18 20:49:12.346519: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (512):   Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346547: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1024):  Total Chunks:1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-03-18 20:49:12.346572: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2048):  Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346596: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4096):  Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346621: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8192):  Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346645: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16384):         TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346670: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (32768):         TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346694: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (65536):         TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346719: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (131072):        TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346743: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (262144):        TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346767: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (524288):        TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346791: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1048576):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346816: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2097152):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346840: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4194304):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346864: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8388608):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346888: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16777216):      TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347077: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (33554432):      TotalChunks: 1, Chunks in use: 0. 52.56MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347101: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (67108864):      TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347125: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (134217728):     TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347149: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (268435456):     TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347176: I tensorflow/core/common_runtime/bfc_allocator.cc:957] Bin for 162.68MiB was 128.00MiB, Chunk State:
2021-03-18 20:49:12.347197: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 55115776
2021-03-18 20:49:12.347225: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000000 of size 256 next 1
2021-03-18 20:49:12.347247: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000100 of size 1280 next 2
2021-03-18 20:49:12.347267: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000600 of size 256 next 3
2021-03-18 20:49:12.347288: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000700 of size 256 next 4
2021-03-18 20:49:12.347309: I tensorflow/core/common_runtime/bfc_allocator.cc:990] Free  at 7fda08000800 of size 55113728 next 18446744073709551615
2021-03-18 20:49:12.347329: I tensorflow/core/common_runtime/bfc_allocator.cc:995]      Summary of in-use Chunks by size:
2021-03-18 20:49:12.347352: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 3 Chunks of size 256 totalling 768B
2021-03-18 20:49:12.347374: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1280 totalling 1.2KiB
2021-03-18 20:49:12.347395: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 2.0KiB
2021-03-18 20:49:12.347416: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 55115776 memory_limit_: 55115776 available bytes: 0 curr_region_allocation_bytes_: 110231552
2021-03-18 20:49:12.347444: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit:                    55115776
InUse:                        2048
MaxInUse:                     2048
NumAllocs:                       6
MaxAllocSize:                 1280

2021-03-18 20:49:12.347509: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *___________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1986, in execution_mode
    yield
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 655, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2363, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
         [[{{node RemoteCall}}]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model_main_tf2.py", line 134, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "model_main_tf2.py", line 131, in main
    record_summaries=FLAGS.record_summaries)
  File "/tf/EPhotoCompteur_Object_Detection/models/research/object_detection/model_lib_v2.py", line 554, in train_loop
    unpad_groundtruth_tensors)
  File "/tf/EPhotoCompteur_Object_Detection/models/research/object_detection/model_lib_v2.py", line 338, in load_fine_tune_checkpoint
    features, labels = iter(input_dataset).next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 292, in next
    return self.__next__()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 296, in __next__
    return self.get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 316, in get_next
    self._iterators[i].get_next_as_list_static_shapes(new_name))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 1112, in get_next_as_list_static_shapes
    return self._iterator.get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line581, in get_next
    result.append(self._device_iterators[i].get_next())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 741, in get_next
    return self._next_internal()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 661, in _next_internal
    return structure.from_compatible_tensor_list(self._element_spec, ret)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1989, in execution_mode
    executor_new.wait()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py", line 67, in wait
    pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
         [[{{node RemoteCall}}]]
2021-03-18 20:49:22.355141: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_1_bfc) ran out of memory trying to allocate 162.68MiB (rounded to 170581504)
Current allocation summary follows.
2021-03-18 20:49:22.355187: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_1_bfc
2021-03-18 20:49:22.355202: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (256):   Total Chunks:3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin.
2021-03-18 20:49:22.355211: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (512):   Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355220: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1024):  Total Chunks:1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-03-18 20:49:22.355229: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2048):  Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355237: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4096):  Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355245: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8192):  Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355253: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16384):         TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355262: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (32768):         TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355270: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (65536):         TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355278: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (131072):        TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355286: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (262144):        TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355293: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (524288):        TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355301: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1048576):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355309: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2097152):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355317: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4194304):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355325: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8388608):       TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355333: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16777216):      TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355342: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (33554432):      TotalChunks: 1, Chunks in use: 0. 52.56MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355350: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (67108864):      TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355358: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (134217728):     TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355367: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (268435456):     TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355376: I tensorflow/core/common_runtime/bfc_allocator.cc:957] Bin for 162.68MiB was 128.00MiB, Chunk State:
2021-03-18 20:49:22.355383: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 55115776
2021-03-18 20:49:22.355396: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000000 of size 256 next 1
2021-03-18 20:49:22.355404: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000100 of size 1280 next 2
2021-03-18 20:49:22.355412: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000600 of size 256 next 3
2021-03-18 20:49:22.355418: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000700 of size 256 next 16
2021-03-18 20:49:22.355425: I tensorflow/core/common_runtime/bfc_allocator.cc:990] Free  at 7fda08000800 of size 55113728 next 18446744073709551615
2021-03-18 20:49:22.355433: I tensorflow/core/common_runtime/bfc_allocator.cc:995]      Summary of in-use Chunks by size:
2021-03-18 20:49:22.355441: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 3 Chunks of size 256 totalling 768B
2021-03-18 20:49:22.355449: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1280 totalling 1.2KiB
2021-03-18 20:49:22.355456: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 2.0KiB
2021-03-18 20:49:22.355463: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 55115776 memory_limit_: 55115776 available bytes: 0 curr_region_allocation_bytes_: 110231552
2021-03-18 20:49:22.355475: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit:                    55115776
InUse:                        2048
MaxInUse:                    34816
NumAllocs:                      20
MaxAllocSize:                12800

2021-03-18 20:49:22.355503: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *___________________________________________________________________________________________________

If it's a GPU memory problem, i don't know how to solve it, i need your help ;) Thanks !


Solution

  • This could be due to the batch_size in the pipeline.config file. Try reducing it to 1 and see if it works.