Search code examples
scikit-learnjoblib

Confusion regarding joblib.dump()


One way to save sklearn models is to use joblib.dump(model,filename). I have a confusion regarding the filename argument. One way to run this function is through :

joblib.dump(model,"model.joblib")

This saves the model successfully and also the model is loaded correctly using the:

model=joblib.load("model.joblib")

Another way is to use :

joblib.dump(model,"model")

With no ".joblib" extension this time. This also runs successfully and the model is loaded correctly using the:

model=joblib.load("model")

What confuses me is the file extension in the filename, Is there a certain file extension that I should use for saving the model? Or it is not necessary to use a file extension as I did above? If it is not necessary, then why?


Solution

  • There is no file extension that "must" be used to serialize a model. You can specify the compression method by using one of the supported filename extensions (.z, .gz, .bz2, .xz or .lzma). By default joblib will use zlib to serialize objects.

    Therefore you can use any file extension. However it is a good practice to use the library as the extension in order to know how to load it.

    I name my serialized model model.pickle when I am using pickle library and model.joblib when I am using joblib.