python tensorflow keras gcloud google-ai-platform

Google AI Platform: Unexpected error when loading the model: 'str' object has no attribute 'decode' [Keras 2.3.1, TF 1.15]

I am trying to use the beta Google Custom Prediction Routine in Google's AI Platform to run a live version of my model.

I include in my package predictor.py which contains a Predictor class as such:

import os
import numpy as np
import pickle
import keras
from keras.models import load_model

class Predictor(object):
    """Interface for constructing custom predictors."""

    def __init__(self, model, preprocessor):
        self._model = model
        self._preprocessor = preprocessor

    def predict(self, instances, **kwargs):
        """Performs custom prediction.

        Instances are the decoded values from the request. They have already
        been deserialized from JSON.

        Args:
            instances: A list of prediction input instances.
            **kwargs: A dictionary of keyword args provided as additional
                fields on the predict request body.

        Returns:
            A list of outputs containing the prediction results. This list must
            be JSON serializable.
        """
        # pre-processing
        preprocessed_inputs = self._preprocessor.preprocess(instances[0])

        # predict
        outputs = self._model.predict(preprocessed_inputs)

        # post-processing
        outputs = np.array([np.fliplr(x) for x in x_test])
        return outputs.tolist()

    @classmethod
    def from_path(cls, model_dir):
        """Creates an instance of Predictor using the given path.

        Loading of the predictor should be done in this method.

        Args:
            model_dir: The local directory that contains the exported model
                file along with any additional files uploaded when creating the
                version resource.

        Returns:
            An instance implementing this Predictor class.
        """
        model_path = os.path.join(model_dir, 'keras.model')
        model = load_model(model_path, compile=False)

        preprocessor_path = os.path.join(model_dir, 'preprocess.pkl')
        with open(preprocessor_path, 'rb') as f:
            preprocessor = pickle.load(f)

        return cls(model, preprocessor)

The full error Create Version failed. Bad model detected with error: "Failed to load model: Unexpected error when loading the model: 'str' object has no attribute 'decode' (Error code: 0)" indicates that the issue is in this script, specifically when loading the model. However, I am able to successfully load the model in my notebook locally with the same code block in predict.py:

from keras.models import load_model
model = load_model('keras.model', compile=False)

I have seen similar posts which suggest to set the version of h5py<3.0.0 but this hasn't helped. I can set versions of modules for my custom prediction routine as such in a setup.py file:

from setuptools import setup

REQUIRED_PACKAGES = ['keras==2.3.1', 'h5py==2.10.0', 'opencv-python', 'pydicom', 'scikit-image']

setup(
    name='my_custom_code',
    install_requires=REQUIRED_PACKAGES,
    include_package_data=True,
    version='0.23',
    scripts=['predictor.py', 'preprocess.py'])

Unfortunately, I haven't found a good way to debug model deployment in google's AI Platform and the troubleshooting guide is unhelpful. Any pointers would be much appreciated. Thanks!

Edit 1:

The h5py module's version is wrong –– at 3.1.0, despite setting it to 2.10.0 in setup.py. Anyone know why? I confirmed that Keras version and other modules are set properly however. I've tried 'h5py==2.9.0' and 'h5py<3.0.0' to no avail. More on including PyPi package dependencies here.

Edit 2:

So it turns out google currently does not support this capability.

StackOverflow, enzed01

Solution

I have encountered the same problem with using AI platform with code that was running fine two months ago, when we last trained our models. Indeed, it is due to the dependency on h5py which fails to load the h5 model out of the blue.

After a while I was able to make it work with runtime 2.2 and python version 3.7. I am also using the custom prediction routine and my model was a simple 2-layer bidirectional LSTM serving classifications.

I had a notebook VM set up with TF == 2.1 and downgraded h5py to <3.0.0 with:

!pip uninstall -y h5py

!pip install 'h5py < 3.0.0'

My setup.py looks like this:

from setuptools import setup

REQUIRED_PACKAGES = ['tensorflow==2.1', 'h5py<3.0.0']

setup(
  name="my_package",
  version="0.1",
  include_package_data=True,
  scripts=["preprocess.py", "model_prediction.py"]
)

I added compile=False to my model load code. Without it, I ran into another problem with deployment which was giving following error: Create Version failed. Bad model detected with error: "Failed to load model: Unexpected error when loading the model: 'sample_weight_mode' (Error code: 0)"

The code change from OP:

model = keras.models.load_model(
        os.path.join(model_dir,'model.h5'), compile = False)

And this made the model be deployed as before without a problem. I suspect the compile=False might mean slower prediction serving, but have not noticed anything so far.

Hope this helps anyone stuck and googling these issues!