tensorflow google-cloud-platform image-segmentation tpu gcp-ai-platform-training

Using model trained in GCP for inferencing?

I'm new to this topic so please bear with me.

I've been following this tutorial to train my own model for segmentation: ShapeMask on GCP The training process completed successfully and I got the following output:

Now, I'm trying to use this in the colab notebook provided by google: Colab

However I'm unable to provide my trained model to it. I require a saved model in that notebook however I had very little luck getting my output converted to saved model.I'm using TF version 1.15.2 on VM and TPU.

There are a few steps in between the training and inferencing that I am missing. But I dont know what they are. Any help is very appretiated. Thank You!

So far, I've tried to convert my files into a saved model using this. And read thru this but could not understand how to make use of it.

Solution

So I was able to kind of get to save model from checkpoints. Using the following snippet on a colab notebook. I had to enable TPU in the colab notebook (Runtime > Change Runtime Type > TPU) probably because I trined on TPU(It would throw an error otherwise).

import os
import tensorflow.compat.v1 as tf
from google.protobuf import text_format
from tensorflow import keras

trained_checkpoint_prefix ='<GC storage bucket path>/model.ckpt-1000'
export_dir = '<GC storage bucket path>'
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']

graph = tf.Graph()
with tf.Session(target=tpu_address,graph=graph) as sess:
    # Reste from checkpoint
    loader = tf.train.import_meta_graph(trained_checkpoint_prefix + '.meta', clear_devices=True)
    loader.restore(sess, trained_checkpoint_prefix)
    # Export checkpoint to SavedModel
    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
    builder.add_meta_graph_and_variables(sess, [tf.saved_model.TRAINING, tf.saved_model.SERVING], strip_default_attrs=True)
    builder.save()

Now I say kind of because this saved model plugged into the Colab tutorial noteook doesnt work. It reads the model successfully in cell 6 but I get error with the inferencing part. Right here:

num_detections, detection_boxes, detection_classes, detection_scores, detection_masks, detection_outer_boxes, image_info = session.run(
['NumDetections:0', 'DetectionBoxes:0', 'DetectionClasses:0', 'DetectionScores:0', 'DetectionMasks:0', 'DetectionOuterBoxes:0', 'ImageInfo:0'],
feed_dict={'Placeholder:0': np_image_string})

The process ends with the following error:

KeyError: "The name 'Placeholder:0' refers to a Tensor which does not exist. The operation, 'Placeholder', does not exist in the graph."

It also cannot find all the other variable names. I'm not sure whats causing this and will update the answer once I do!

EDIT1:

I solved the isue using the following readme.

First I used TF 2.2 and main branch of TPU repo instead of the shapemask branch. Then followed the exact steps from original tutorial for training. And used the following command to export the saved model:

python ~/tpu/models/official/detection/export_saved_model.py \
--export_dir="${EXPORT_DIR?}" \
--checkpoint_path="${CHECKPOINT_PATH?}" \
--params_override="${PARAMS_OVERRIDE?}" \
--batch_size=${BATCH_SIZE?} \
--input_type="${INPUT_TYPE?}" \
--input_name="${INPUT_NAME?}" \

Here params override flag should be passed the params.yaml file created during training. Batch size is set to 1 for doing one image at a time. More details can be found in the readme file.

Note: I had to comment out the following line for it to execute:

import segmentation from serving

It exported the model and was able to load and use it in the colab notebook with some minute adjustments to the notebook.