Search code examples
apache-sparkpysparkapache-spark-ml

Extracting MLP Layers from PySpark ParamGrid


So I had trouble extracting hyperparameters from a PySpark model after Pipeline and CrossValidator.

I found the following answer on StackOverflow: How to extract model hyper-parameters from spark.ml in PySpark?

This was very helpful and the following line worked for me:

modelOnly.bestModel.stages[-1]._java_obj.parent().getRegParam()

The new problem is that I'm running a MLP and when trying to extract the layers I get a random string of characters instead of something like a Python list.

Result:

StepSize: 0.03

Layers: [I@db98c25

My code roughly was:

trainer = MultilayerPerceptronClassifier(featuresCol='features', 
                                     labelCol='label', 
                                     predictionCol='prediction', 
                                     maxIter=100, 
                                     tol=1e-06, 
                                     seed=1331, 
                                     layers=layers1, 
                                     blockSize=128, 
                                     stepSize=0.03, 
                                     solver='l-bfgs', 
                                     initialWeights=None, 
                                     probabilityCol='probability', 
                                     rawPredictionCol='rawPrediction')

pipeline = Pipeline(stages=[assembler1,stringIdx,trainer])

paramGrid = ParamGridBuilder() \
.addGrid(trainer.maxIter, [10]) \
.addGrid(trainer.tol, [1e-06]) \
.addGrid(trainer.stepSize, [0.03]) \
.addGrid(trainer.layers, [layers2]) \
.build()

crossval = CrossValidator(estimator=pipeline,
                      estimatorParamMaps=paramGrid,
                      evaluator=MulticlassClassificationEvaluator(metricName="accuracy"),
                      numFolds=3)

cvModel = crossval.fit(df)

mybestmodel = cvModel.bestModel

java_model = mybestmodel.stages[-1]._java_obj

print("StepSize: ", end='')
print(java_model.parent().getStepSize())
print("Layers: ", end='')
print(java_model.parent().getLayers())

I'm running Spark 2.3.2.

What am I missing?

Thanks :)


Solution

  • That's not a random string, but the representation of the corresponding Java object.

    While in theory you could

    [x for x in mybestmodel.stages[-1]._java_obj.parent().getLayers()]
    

    there is really no need for that

    layers

    array of layer sizes including input and output layers.

    New in version 1.6.0.

    i.e.

    mybestmodel.stages[-1].layers