Search code examples
python-2.7csvtensorflowk-meansgoogle-cloud-ml-engine

Export a KMeans model using export_savedmodel to deploy on ml-engine


I'm doing a K-means clustering using tensorflow.contrib.learn.KMeansClustering.

I can use it default model to predict local but since I want to use ml-engine online prediction, I must export it to a export_savedmodel format.

I have google lot's of place but since KMeansClustering class needs no feature columns so I don't know how to build the correct serving_input_fn for export_savedmodel

Here is my code

# Generate input_fn
def gen_input(data):
    return tf.constant(data.as_matrix(), tf.float32, data.shape), None

# Declare dataset + export model path
TRAIN = 'train.csv'
MODEL = 'model'

# Read dataset
body = pd.read_csv(
    file_io.FileIO(TRAIN, mode='r'),
    delimiter=',',
    header=None,
    engine='python'
)

# Declare K-Means
km = KMeansClustering(
    num_clusters=2,
    model_dir=MODEL,
    relative_tolerance=0.1
)

est = km.fit(input_fn=lambda: gen_input(body))

# This place is where I stuck
fcols = [tf.contrib.layers.real_valued_column('x', dimension=5)]
fspec = tf.contrib.layers.create_feature_spec_for_parsing(fcols)
serving_input_fn = tf.contrib.learn.python.learn.\
                   utils.input_fn_utils.build_parsing_serving_input_fn(fspec)
est.export_savedmodel(MODEL, serving_input_fn)

Here is my toy train.csv

1,2,3,4,5
2,3,4,5,6
3,4,5,6,7
5,4,3,2,1
7,6,5,4,3
8,7,6,5,4

Exported model have the format of saved_model.pb with its variables folder

Deploying the model to ml-engine was successful, but when predicting with same train.csv I got the following error

{"error": "Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"Name: <unknown>, Feature: x (data type: float) is required but could not be found.\n\t [[Node: ParseExample/ParseExample = ParseExample[Ndense=1, Nsparse=0, Tdense=[DT_FLOAT], _output_shapes=-1,5, dense_shapes=5, sparse_types=[], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](_arg_input_example_tensor_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/dense_keys_0, ParseExample/Const)]]\")"}

I have struggled with this for month while all documents that I found are for pure API

I'm looking forward to your advice

Thanks in advance


Solution

  • The Census sample shows how to setup the serving_input_fn for CSV. Adjusted for your example:

    CSV_COLUMNS = ['feat1', 'feat2', 'feat3', 'feat4', 'feat5']
    CSV_COLUMN_DEFAULTS = [[0.0],[0.0],[0.0],[0.0],[0.0]] 
    
    def parse_csv(rows_string_tensor):
      """Takes the string input tensor and returns a dict of rank-2 tensors."""
    
      # Takes a rank-1 tensor and converts it into rank-2 tensor
      # Example if the data is ['csv,line,1', 'csv,line,2', ..] to
      # [['csv,line,1'], ['csv,line,2']] which after parsing will result in a
      # tuple of tensors: [['csv'], ['csv']], [['line'], ['line']], [[1], [2]]
      row_columns = tf.expand_dims(rows_string_tensor, -1)
      columns = tf.decode_csv(row_columns, record_defaults=CSV_COLUMN_DEFAULTS)
      features = dict(zip(CSV_COLUMNS, columns))
    
      return features
    
    def csv_serving_input_fn():
      """Build the serving inputs."""
      csv_row = tf.placeholder(
          shape=[None],
          dtype=tf.string
      )
      features = parse_csv(csv_row)
      return tf.contrib.learn.InputFnOps(features, None, {'csv_row': csv_row})
    
    # No need for fcols/fspec
    est.export_savedmodel(MODEL, serving_input_fn)
    

    TensorFlow 1.4 will simplify at least some of this.

    Also, consider using JSON, as that is the more standard approach for serving. Happy to provide details upon request.