json dictionary tensorflow yaml object-detection-api

Convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file for tensorflow object detection API

I want to convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file format for tensorflow object detection API. I tried converting the protobuf file using :

import tensorflow as tf
from google.protobuf import text_format
import yaml

from object_detection.protos import pipeline_pb2

def get_configs_from_pipeline_file(pipeline_config_path, config_override=None):

  '''
  read .config and convert it to proto_buffer_object
  '''

  pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
  with tf.gfile.GFile(pipeline_config_path, "r") as f:
    proto_str = f.read()
    text_format.Merge(proto_str, pipeline_config)
  if config_override:
    text_format.Merge(config_override, pipeline_config)
  #print(pipeline_config)
  return pipeline_config


def create_configs_from_pipeline_proto(pipeline_config):
  '''
  Returns the configurations as dictionary
  '''

  configs = {}
  configs["model"] = pipeline_config.model
  configs["train_config"] = pipeline_config.train_config
  configs["train_input_config"] = pipeline_config.train_input_reader
  configs["eval_config"] = pipeline_config.eval_config
  configs["eval_input_configs"] = pipeline_config.eval_input_reader
  # Keeps eval_input_config only for backwards compatibility. All clients should
  # read eval_input_configs instead.
  if configs["eval_input_configs"]:
    configs["eval_input_config"] = configs["eval_input_configs"][0]
  if pipeline_config.HasField("graph_rewriter"):
    configs["graph_rewriter_config"] = pipeline_config.graph_rewriter

  return configs


configs = get_configs_from_pipeline_file('pipeline.config')
config_as_dict = create_configs_from_pipeline_proto(configs)

But when I try converting this returned dictionary to YAML with yaml.dump(config_as_dict) it says

TypeError: can't pickle google.protobuf.pyext._message.RepeatedCompositeContainer objects

For json.dump(config_as_dict) it says :

Traceback (most recent call last):
  File "config_file_parsing.py", line 48, in <module>
    config_as_json = json.dumps(config_as_dict)
  File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: label_map_path: "label_map.pbtxt"
shuffle: true
tf_record_input_reader {
  input_path: "dataset.record"
}
 is not JSON serializable

Would appreciate some help here.

Solution

JSON can only dump a subset of the python primtivies primitives and dict and list collections (with limitation on self-referencing).

YAML is more powerful, and can be used to dump arbitrary Python objects. But only if those objects can be "investigated" during the representation phase of the dump, which essentially limits that to instances of pure Python classes. For objects created at the C level, one can make explicit dumpers, and if not available Python will try and use the pickle protocol to dump the data to YAML.

Inspecing protobuf on PyPI shows me that there are non-generic wheels available, which is always an indication for some C code optimization. Inspecting one of these files indeed shows a pre-compiled shared object.

Although you make a dict out of the config, this dict can of course only be dumped when all its keys and all its values can be dumped. Since your keys are strings (necessary for JSON), you need to look at each of the values, to find the one that doesn't dump, and convert that to a dumpable object structure (dict/list for JSON, pure Python class for YAML).

You might want to take a look at Module json_format