I've written a custom ML Pipeline Estimator
and Transformer
for my own Python algorithm by following the pattern shown here.
However, in that example all the parameters needed by _transform()
were conveniently passed into the Model/Transformer by the estimator's _fit()
method. But my transformer has several parameters that control the way the transform is applied. These parameters are specific to the transformer so it would feel odd to pass them into the estimator in advance along with the estimator-specific parameters used for fitting the model.
I can work around this by adding extra Params
to the transformer. This works fine when I use my estimator and transformer outside of an ML Pipeline. But how can I set these transformer-specific parameters once my estimator object has been added as a stage to a Pipeline? For example, you can call getStages()
on a pyspark.ml.pipeline.Pipeline
and can therefore get the estimators, but there is no corresponding getStages()
method on PipelineModel
. I can't see any methods for setting parameters on the PipelineModel
stages either.
So how can I set the parameters on my transformer before I call transform()
on the fitted pipeline model? I'm on Spark 2.2.0.
There is no getStages()
method on PipelineModel
but the same class does have an undocumented member called stages
.
For example, if you've just fitted a pipeline model with 3 stages and you want to set some parameters on the second stage, you can just do something like:
myModel = myPipelineModel.stages[1]
myModel.setMyParam(42)
# Or in one line:
#myPipelineModel.stages[1].setMyParam(42)
# Now we can push our data through the fully configured pipeline model:
resultsDF = myPipelineModel.transform(inputDF)