I want to convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file format for tensorflow object detection API. I tried converting the protobuf file using :
import tensorflow as tf
from google.protobuf import text_format
import yaml
from object_detection.protos import pipeline_pb2
def get_configs_from_pipeline_file(pipeline_config_path, config_override=None):
'''
read .config and convert it to proto_buffer_object
'''
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.gfile.GFile(pipeline_config_path, "r") as f:
proto_str = f.read()
text_format.Merge(proto_str, pipeline_config)
if config_override:
text_format.Merge(config_override, pipeline_config)
#print(pipeline_config)
return pipeline_config
def create_configs_from_pipeline_proto(pipeline_config):
'''
Returns the configurations as dictionary
'''
configs = {}
configs["model"] = pipeline_config.model
configs["train_config"] = pipeline_config.train_config
configs["train_input_config"] = pipeline_config.train_input_reader
configs["eval_config"] = pipeline_config.eval_config
configs["eval_input_configs"] = pipeline_config.eval_input_reader
# Keeps eval_input_config only for backwards compatibility. All clients should
# read eval_input_configs instead.
if configs["eval_input_configs"]:
configs["eval_input_config"] = configs["eval_input_configs"][0]
if pipeline_config.HasField("graph_rewriter"):
configs["graph_rewriter_config"] = pipeline_config.graph_rewriter
return configs
configs = get_configs_from_pipeline_file('pipeline.config')
config_as_dict = create_configs_from_pipeline_proto(configs)
But when I try converting this returned dictionary to YAML with yaml.dump(config_as_dict)
it says
TypeError: can't pickle google.protobuf.pyext._message.RepeatedCompositeContainer objects
For json.dump(config_as_dict)
it says :
Traceback (most recent call last):
File "config_file_parsing.py", line 48, in <module>
config_as_json = json.dumps(config_as_dict)
File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: label_map_path: "label_map.pbtxt"
shuffle: true
tf_record_input_reader {
input_path: "dataset.record"
}
is not JSON serializable
Would appreciate some help here.
JSON can only dump a subset of the python primtivies primitives and dict and list collections (with limitation on self-referencing).
YAML is more powerful, and can be used to dump arbitrary Python objects. But only if those objects can be "investigated" during the representation phase of the dump, which essentially limits that to instances of pure Python classes. For objects created at the C level, one can make explicit dumpers, and if not available Python will try and use the pickle protocol to dump the data to YAML.
Inspecing protobuf on PyPI shows me that there are non-generic wheels available, which is always an indication for some C code optimization. Inspecting one of these files indeed shows a pre-compiled shared object.
Although you make a dict out of the config, this dict can of course only be dumped when all its keys and all its values can be dumped. Since your keys are strings (necessary for JSON), you need to look at each of the values, to find the one that doesn't dump, and convert that to a dumpable object structure (dict/list for JSON, pure Python class for YAML).
You might want to take a look at Module json_format