machine-learning google-cloud-platform scikit-learn google-cloud-vertex-ai

Importing a Model with Scikit Learn on Vertex

I'm trying to import a model from my local but everytime I get the same error from gcp logs. The framework is scikit-learn AttributeError: Can't get attribute 'preprocess_text' on <module 'model_server' from '/usr/app/model_server.py'> The code snippet with this problem is

complaints_clf_pipeline = Pipeline(
    [
        ("preprocess", text.TfidfVectorizer(preprocessor=utils.preprocess_text, ngram_range=(1, 2))),
        ("clf", naive_bayes.MultinomialNB(alpha=0.3)),
    ]
)

this

preprocess_text

comes from the cell above, but I keep receiving this issue with model_server which is not present on my code.

Can someone help?

I tried to refactor the code but got the same error, tried to undo this pipeline structure but then I got another error while trying to consult the model by API.

Solution

GCP is trying to load the model, but it can't find the preprocess_text function because it's not included in the serialized model.

Save the scikit-learn pipeline, functions like preprocess_text are not automatically saved with the model. To ensure that GCP knows where to find this function, you can either:

Define preprocess_text inside the same script where you're loading the model, or Package utils as part of your deployment (including it in your GCP deployment files) so that the preprocess_text function is available in the same environment.

import pickle
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

class CustomTextClassifier:
    def __init__(self):
        self.pipeline = Pipeline(
            [
                ("preprocess", TfidfVectorizer(preprocessor=self.preprocess_text, ngram_range=(1, 2))),
                ("clf", MultinomialNB(alpha=0.3)),
            ]
        )

    def preprocess_text(self, text):
        
        return text.lower() 

    def train(self, X, y):
        self.pipeline.fit(X, y)

    def predict(self, X):
        return self.pipeline.predict(X)


model = CustomTextClassifier()
# train model with your data...
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)