I'm using an object detection module for classifying images. My specs are as follows:
I'm working on Windows virtualenv and google-colab. This is the error message I get:
python3 legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config
INFO:tensorflow:global step 1: loss = 18.5013 (48.934 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
warnings.warn("Attempting to use a closed FileWriter. "
Traceback (most recent call last):
File "legacy/train.py", line 184, in <module>
tf.app.run()
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "legacy/train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/home/priyank/venv/models-master/research/object_detection/legacy/trainer.py", line 416, in train
saver=saver)
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train
ignore_live_threads=ignore_live_threads)
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 832, in stop
ignore_live_threads=ignore_live_threads)
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/priyank/venv/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
enqueue_callable()
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
<b>tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[15,1,1755,2777,3] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[{{node batch}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.</b>
You can try the following fixes:
1. Reducing the image dimension in case you are using very high image resolution
2. Try reducing the batch size
3. Check if any other process is using up your memory
Could you also please share your config file