So I had trouble extracting hyperparameters from a PySpark model after Pipeline and CrossValidator.
I found the following answer on StackOverflow: How to extract model hyper-parameters from spark.ml in PySpark?
This was very helpful and the following line worked for me:
modelOnly.bestModel.stages[-1]._java_obj.parent().getRegParam()
The new problem is that I'm running a MLP and when trying to extract the layers I get a random string of characters instead of something like a Python list.
Result:
StepSize: 0.03
Layers: [I@db98c25
My code roughly was:
trainer = MultilayerPerceptronClassifier(featuresCol='features',
labelCol='label',
predictionCol='prediction',
maxIter=100,
tol=1e-06,
seed=1331,
layers=layers1,
blockSize=128,
stepSize=0.03,
solver='l-bfgs',
initialWeights=None,
probabilityCol='probability',
rawPredictionCol='rawPrediction')
pipeline = Pipeline(stages=[assembler1,stringIdx,trainer])
paramGrid = ParamGridBuilder() \
.addGrid(trainer.maxIter, [10]) \
.addGrid(trainer.tol, [1e-06]) \
.addGrid(trainer.stepSize, [0.03]) \
.addGrid(trainer.layers, [layers2]) \
.build()
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=MulticlassClassificationEvaluator(metricName="accuracy"),
numFolds=3)
cvModel = crossval.fit(df)
mybestmodel = cvModel.bestModel
java_model = mybestmodel.stages[-1]._java_obj
print("StepSize: ", end='')
print(java_model.parent().getStepSize())
print("Layers: ", end='')
print(java_model.parent().getLayers())
I'm running Spark 2.3.2.
What am I missing?
Thanks :)
That's not a random string, but the representation of the corresponding Java object.
While in theory you could
[x for x in mybestmodel.stages[-1]._java_obj.parent().getLayers()]
there is really no need for that
layers
array of layer sizes including input and output layers.
New in version 1.6.0.
i.e.
mybestmodel.stages[-1].layers