I am in main.py
at the root directory at main.py
calling the model script to train the model. The directory looks like this
After training the model, I am planning to save and log the PyTorch model using MLflow. Here’s the code
# Registering the model to the workspace
mlflow.pytorch.log_model(
pytorch_model= model,
registered_model_name="use-case1-model",
artifact_path="use-case1-model",
input_example=df[['Title', 'Attributes']],
conda_env=os.path.join("./dependencies", "conda.yaml"),
code_paths="./models"
]
)
# Saving the model to a file
mlflow.pytorch.save_model(
pytorch_model= model,
conda_env=os.path.join("./dependencies", "conda.yaml"),
input_example=df[['Title', 'Attributes']],
path=os.path.join(args.model, "use-case1-model"),
code_paths="./models"
)
But I am getting an error while saving the code paths, saying the directory is not found.
Question 1: is there a need to save the code paths and extra files parameter in my case?
Question 2: What's the right way to save the code paths directory?
https://mlflow.org/docs/latest/python_api/mlflow.pytorch.html
As per function definition, the parameter code_paths is for giving a list of local filesystem paths to Python file dependencies (or directories containing file dependencies).
If your model having such kind dependencies you need to provide there paths in list to code_paths.
The error you are getting about directory not found can resolved by taking abs path as below.
code_pth = os.path.abspath("")+"/media/model/"
conda_env = os.path.abspath("")+"/dependencies/"
print(conda_env)
print(code_pth)
I have used sklearn model to log and save.
mlflow.sklearn.log_model(
sk_model=clf,
registered_model_name=registered_model_name,
artifact_path=registered_model_name,
code_paths=[code_pth],
conda_env=os.path.join(conda_env, "conda.yaml")
)
Output:
mlflow.sklearn.save_model(
sk_model=clf,
path=os.path.join(registered_model_name, "trained_model"),
code_paths=[code_pth],
conda_env=os.path.join(conda_env, "conda.yaml")
)
Output: