Search code examples
machine-learninggoogle-cloud-platformscikit-learngoogle-cloud-vertex-ai

Importing a Model with Scikit Learn on Vertex


I'm trying to import a model from my local but everytime I get the same error from gcp logs. The framework is scikit-learn AttributeError: Can't get attribute 'preprocess_text' on <module 'model_server' from '/usr/app/model_server.py'> The code snippet with this problem is

complaints_clf_pipeline = Pipeline(
    [
        ("preprocess", text.TfidfVectorizer(preprocessor=utils.preprocess_text, ngram_range=(1, 2))),
        ("clf", naive_bayes.MultinomialNB(alpha=0.3)),
    ]
)

this

preprocess_text 

comes from the cell above, but I keep receiving this issue with model_server which is not present on my code.

Can someone help?

I tried to refactor the code but got the same error, tried to undo this pipeline structure but then I got another error while trying to consult the model by API.


Solution

  • GCP is trying to load the model, but it can't find the preprocess_text function because it's not included in the serialized model.

    Save the scikit-learn pipeline, functions like preprocess_text are not automatically saved with the model. To ensure that GCP knows where to find this function, you can either:

    Define preprocess_text inside the same script where you're loading the model, or Package utils as part of your deployment (including it in your GCP deployment files) so that the preprocess_text function is available in the same environment.

    import pickle
    from sklearn.pipeline import Pipeline
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    
    class CustomTextClassifier:
        def __init__(self):
            self.pipeline = Pipeline(
                [
                    ("preprocess", TfidfVectorizer(preprocessor=self.preprocess_text, ngram_range=(1, 2))),
                    ("clf", MultinomialNB(alpha=0.3)),
                ]
            )
    
        def preprocess_text(self, text):
            
            return text.lower() 
    
        def train(self, X, y):
            self.pipeline.fit(X, y)
    
        def predict(self, X):
            return self.pipeline.predict(X)
    
    
    model = CustomTextClassifier()
    # train model with your data...
    with open('model.pkl', 'wb') as f:
        pickle.dump(model, f)