google-cloud-storage databricks mlflow large-language-model

How to upload and register a model to Databricks from Vertex AI?

I fine-tuned an LLM in Vertex AI, but I would like to register and load the model into Databricks so I can run inference from there. Currently I have these files in a GCS bucket:

added_tokens.json
config.json
generation_config.json
merges.txt
pytorch_model.bin
special_tokens_map.json
tokenizer_config.json
training_args.bin
vocab.json

Going off of this link I can save a model locally to DBFS, but it doesn't mention which file(s) I need to upload. After I upload them, I assume I can then register the model in the model registry using this code:

mlflow.register_model("runs:/{run_id}/{model-path}", "{registered-model-name}")

Am I going about this correctly? I saw the two other questions about this but they didn't quite answer my question.

Solution

I think you're on the right way.

I'd upload all the files that you have here into a folder in DBFS then try to load the model in a Databricks notebook.

class Net(nn.Module):
// Your Model for which you want to load parameters 

model = Net()
torch.optim.SGD(lr=0.001) #According to your own Configuration.
checkpoint = torch.load(pytorch_model)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['opt'])

Once that's done, use mlflow.pytorch.log() to save your model to MLFlow experiments. Then you can use mlflow.register() or pass the argument registered_model_name when you log the model to register it directly.

You might need to create an MLFlow custom class depending on how complex your model is (MLFlow Pyfunc), which is often the case for LLM as they need additional components to work. You'll then need to specify your custom way to override the functions load_context (to load your model) and predict(how to predict with your model). There's a cool example in the article up there!