Search code examples
pythonpysparkapache-spark-ml

How to get the best hyperparameter from MLP pipeline model in pyspark?


I am using MLP classifier from pyspark.ml.classification. I am fitting my MLP model to the dataset using crossvalidation i.e; ParamGrid method. I am using ParamGrid method to iterate over several hyperparameters. After that I am using Crossvalidation class for training and to get best hyperparameters. After training when I am trying to access the best hyperparameter from crossvalidation object I am getting an error. Could anyone tell me how to get the best hyperparameters?

from pyspark.ml.classification import MultilayerPerceptronClassifier
layers = [4, 5, 4, 3]
clf = MultilayerPerceptronClassifier(labelCol='label',layers=layers)
pipeline = Pipeline(stages=[clf])
x1 = 'stepSize'
x2 = 'maxIter'
paramGrid = ParamGridBuilder() \
    .addGrid(getattr(clf,x1), [0.1, 0.2]) \
    .addGrid(getattr(clf,x2),[5,10])\
    .build()
evaluator = MulticlassClassificationEvaluator(labelCol='label',
                                                          predictionCol='prediction', metricName='f1')
crossval = CrossValidator(estimator=pipeline,
                                      estimatorParamMaps=paramGrid,
                                      evaluator=evaluator,
                                      numFolds=2)
cvModel = crossval.fit(train_data)
cvModel.bestModel.stages[0]._java_obj.getMaxIter()

Error:

Py4JError: An error occurred while calling o1127.getMaxIter. Trace:
py4j.Py4JException: Method getMaxIter([]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

This cvModel.bestModel.stages[0]._java_obj.getMaxIter() is working When I am using logistic regression or random forest classifiers. I am getting the error only when I am using MLP classifier. Is there any method to get the best hyperparameters when we use MLP classifier?


Solution

  • I was getting the same error running exactly the same code and the following line from the following post solved this problem for me.

    How to extract model hyper-parameters from spark.ml in PySpark?

    modelOnly.bestModel.stages[-1]._java_obj.parent().getRegParam()
    

    So the part you're missing is the "parent()" call, you need the "parent()" call. Hope this helps!