Search code examples
tensorflowdeep-learningtensorflow-servingtensorflow-estimatorensemble-learning

Ensemble two tensorflow models


I'm trying to create a single model out of two almost identical models, trained under different conditions and average their outputs inside tensorflow. We want the final model to have the same interface for inference.

We have saved a checkpoint of the two models, and here is how we are trying to solve the problem:

merged_graph = tf.Graph()
with merged_graph.as_default():
    saver1 = tf.train.import_meta_graph('path_to_checkpoint1_model1.meta', import_scope='g1')
    saver2 = tf.train.import_meta_graph('path_to_checkpoint1_model2.meta', import_scope='g2')

with tf.Session(graph=merged_graph) as sess:
  saver1.restore(sess, 'path_to_checkpoint1_model1')
  saver1.restore(sess, 'path_to_checkpoint1_model2')    

  sess.run(tf.global_variables_initializer())

  # export as a saved_model
  builder = tf.saved_model.builder.SavedModelBuilder(kPathToExportDir)
  builder.add_meta_graph_and_variables(sess,
                                       [tf.saved_model.tag_constants.SERVING],
                                       strip_default_attrs=True)    
  builder.save()

There are at least 3 flaws with the above approach, and we have tried many routes but can't get this to work:

  1. The graphs for model1 and model2, have their own main ops. As a result, the model fails during loading with the following error: Failed precondition:

_

Expected exactly one main op in : model
Expected exactly one SavedModel main op. Found: [u'g1/group_deps', u'g2/group_deps']
  1. The two models have their own Placeholder nodes for input (i.e. g1/Placeholder and g2/Placeholder after merging). We couldn't find a way to remove the Placeholder nodes to create a new one that feeds input to both models (we don't want a new interface where we need to feed data into two different placeholders).

  2. The two graphs have their own init_all, restore_all nodes. We couldn't figure out how to combine these NoOp operations into single nodes. This is the same as problem #1.

We couldn't as well find a sample implementation of such mode ensembling inside tensorflow. A sample code might answer all the above questions.

Note: My two models were trained using tf.estimator.Estimator and exported as saved_models. As a result, they contain the main_op.


Solution

  • I did not solve, but found a workaround for the above problem.

    The main problem is that main_op node is added whenever a model is exported with the saved_model API. Since both my models were exported with this API, both had the main_op node, which would be imported into the new graph. Then, the new graph would contain two main_ops which will later fail to load as exactly one main op is expected.

    The workaround I chose to use was not to export my final model with the saved_model API, but export with the old handy freeze_graph into a single .pb file.

    Here is my working code snippet:

    # set some constants:
    #   INPUT_SHAPE, OUTPUT_NODE_NAME, OUTPUT_FILE_NAME, 
    #   TEMP_DIR, TEMP_NAME, SCOPE_PREPEND_NAME, EXPORT_DIR
    
    # Set path for trained models which are exported with the saved_model API
    input_model_paths = [PATH_TO_MODEL1, 
                         PATH_TO_MODEL2, 
                         PATH_TO_MODEL3, ...]
    num_model = len(input_model_paths)
    
    def load_model(sess, path, scope, input_node):
        tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], 
                                   path,
                                   import_scope=scope, 
                                   input_map={"Placeholder": input_node})  
        output_tensor = tf.get_default_graph().get_tensor_by_name(
            scope + "/" + OUTPUT_NODE_NAME + ":0")
        return output_tensor  
    
    with tf.Session(graph=tf.Graph()) as sess:
      new_input = tf.placeholder(dtype=tf.float32, 
                                 shape=INPUT_SHAPE, name="Placeholder")      
    
      output_tensors = []
      for k, path in enumerate(input_model_paths):
        output_tensors.append(load_model(sess, 
                                         path, 
                                         SCOPE_PREPEND_NAME+str(k), 
                                         new_input))
      # Mix together the outputs (e.g. sum, weighted sum, etc.)
      sum_outputs = output_tensors[0] + output_tensors[1]
      for i in range(2, num_model):
        sum_outputs = sum_outputs + output_tensors[i]
      final_output = tf.divide(sum_outputs, float(num_model), name=OUTPUT_NODE_NAME)
    
      # Save checkpoint to be loaded later by the freeze_graph!
      saver_checkpoint = tf.train.Saver()
      saver_checkpoint.save(sess, os.path.join(TEMP_DIR, TEMP_NAME))
    
      tf.train.write_graph(sess.graph_def, TEMP_DIR, TEMP_NAME + ".pbtxt")
      freeze_graph.freeze_graph(
          os.path.join(TEMP_DIR, TEMP_NAME + ".pbtxt"), 
          "", 
          False, 
          os.path.join(TEMP_DIR, TEMP_NAME),  
          OUTPUT_NODE_NAME, 
          "", # deprecated
          "", # deprecated
          os.path.join(EXPORT_DIR, OUTPUT_FILE_NAME),
          False,
          "")