Search code examples
pythontensorflowgoogle-colaboratorymobilenetroboflow

Failing During Training MobileNetSSD Object Detection on a Custom Dataset Google Colab


I'm following a Google Colab guide from Roboflow to train the MobileNetSSD Object detection model from Tensorflow on a custom dataset. Here is the link to the colab guide: https://colab.research.google.com/drive/1wTMIrJhYsQdq_u7ROOkf0Lu_fsX5Mu8a

The data set is the example set from the Roboflow website called "Chess sample" which everyone who registers an account on the website gets in their workspace folder. Here is the link to get that setup: https://blog.roboflow.com/getting-started-with-roboflow/

When following the Colab all steps are running completely fine until the step "Train the model". The following message is printed:

Using TensorFlow backend.
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0407 09:32:54.234921 140683261384576 model_lib.py:839] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 30000
I0407 09:32:54.235201 140683261384576 config_util.py:552] Maybe overwriting train_steps: 30000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0407 09:32:54.235418 140683261384576 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0407 09:32:54.235595 140683261384576 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0407 09:32:54.235742 140683261384576 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0407 09:32:54.235958 140683261384576 model_lib.py:855] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu None
I0407 09:32:54.236146 140683261384576 model_lib.py:892] create_estimator_and_inputs: use_tpu False, export_to_tpu None
INFO:tensorflow:Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff2e7837050>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0407 09:32:54.236825 140683261384576 estimator.py:212] Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff2e7837050>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7ff2e7827b90>) includes params argument, but params are not passed to Estimator.
W0407 09:32:54.237167 140683261384576 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7ff2e7827b90>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0407 09:32:54.237831 140683261384576 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0407 09:32:54.238137 140683261384576 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0407 09:32:54.238746 140683261384576 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0407 09:32:54.247304 140683261384576 deprecation.py:323] From /tensorflow-1.15.2/python3.7/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Reading unweighted datasets: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
I0407 09:32:54.314255 140683261384576 dataset_builder.py:162] Reading unweighted datasets: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
I0407 09:32:54.315222 140683261384576 dataset_builder.py:79] Reading record datasets for input file: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0407 09:32:54.315392 140683261384576 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0407 09:32:54.315579 140683261384576 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:104: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0407 09:32:54.323929 140683261384576 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:104: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0407 09:32:54.377984 140683261384576 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:Entity <bound method TfExampleDecoder.decode of <object_detection.data_decoders.tf_example_decoder.TfExampleDecoder object at 0x7ff2e7837e50>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Index'
W0407 09:32:54.518877 140683261384576 ag_logging.py:146] Entity <bound method TfExampleDecoder.decode of <object_detection.data_decoders.tf_example_decoder.TfExampleDecoder object at 0x7ff2e7837e50>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Index'
Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main.py", line 109, in <module>
    tf.app.run()
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/content/models/research/object_detection/model_main.py", line 105, in main
    tf_estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    input_fn, ModeKeys.TRAIN))
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn
    self._call_input_fn(input_fn, mode))
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1116, in _call_input_fn
    return input_fn(**kwargs)
  File "/content/models/research/object_detection/inputs.py", line 770, in _train_input_fn
    params=params)
  File "/content/models/research/object_detection/inputs.py", line 913, in train_input
    reduce_to_frame_fn=reduce_to_frame_fn)
  File "/content/models/research/object_detection/builders/dataset_builder.py", line 251, in build
    input_reader_config)
  File "/content/models/research/object_detection/builders/dataset_builder.py", line 236, in dataset_map_fn
    fn_to_map, num_parallel_calls=num_parallel_calls)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 1950, in map_with_legacy_function
    use_legacy_function=True))
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in __init__
    use_legacy_function=use_legacy_function)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in __init__
    self._function.add_to_graph(ops.get_default_graph())
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 545, in add_to_graph
    self._create_definition_if_needed()
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 377, in _create_definition_if_needed
    self._create_definition_if_needed_impl()
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 408, in _create_definition_if_needed_impl
    capture_resource_var_by_value=self._capture_resource_var_by_value)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 944, in func_graph_from_py_func
    outputs = func(*func_graph.inputs)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2681, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
NotImplementedError: in converted code:

    /content/models/research/object_detection/data_decoders/tf_example_decoder.py:580 decode
        default_groundtruth_weights)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py:507 new_func
        return func(*args, **kwargs)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1235 cond
        orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1061 BuildCondBranch
        original_result = fn()
    /content/models/research/object_detection/data_decoders/tf_example_decoder.py:573 default_groundtruth_weights
        dtype=tf.float32)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2560 ones
        output = _constant_if_small(one, shape, dtype, name)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2295 _constant_if_small
        if np.prod(shape) < 1000:
    <__array_function__ internals>:6 prod
        
    /usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3052 prod
        keepdims=keepdims, initial=initial, where=where)
    /usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:86 _wrapreduction
        return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:736 __array__
        " array.".format(self.name))

    NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array.

Our guess is that it has something to do with the versions of Python or NumPy being newer than what they were when the Colab was created.


Solution

  • Yes, indeed - downgrading numpy will solve the issue - we saw this same bug in the Roboflow Faster RCNN tutorial. These new installs are now present in the MobileNet SSD Roboflow tutorial notebook.

    !pip install numpy==1.19.5
    !pip uninstall -y pycocotools
    !pip install pycocotools --no-binary pycocotools