I have a pyspark pipeline model saved on HDFS like:
stages = feature_stage_list \
+ label_stage_list \
+ assembler_stage_list \
+ classifier_list \
+ label_converter_stage_list
pipeline = Pipeline(stages = stages)
pipeline_model.save(path)
where classifier is a LogisticRegression
model. I would like to be able to load the same model in a separate spark job and access the parameters of it. The following is used to load the model:
pipeline_model = PipelineModel.load(path)
now when I try to get the model stages or parameters it returns nothing. I have tried the following so far:
pipeline_model.params pipeline_model.explainParams()
printing the type of loaded object:
<class 'pyspark.ml.pipeline.PipelineModel'>
I'm surprised to see that those return empty as we are also saving the same model using mlflow and I could see both stages of the same model as well as parameters when I look into model artifacts. Am I missing something here?
For those who might want to do the same, I ended up doing the following to get the parameters:
pipeline_model.stages[-2].getMaxIter()
pipeline_model.stages[-2].getElasticNetParam()
pipeline_model.stages[-2].getFamily()
...
Not that according to the stages shown in the original post, the classifier_list is second to last and hence it is access by pipeline_model.stages[-2]
so make sure to update according to your own pipeline order of stages.