¿What is maxIter in MultilayerPerceptronClassifier - Spark - mllib?
1. The parameter maxIter tells the optimization algorithm the maximum number of hops it is allowed to do to find the minimum error?
OR
2. The parameter maxIter tells the maximum number of epochs (the number maximum of times the entire dataset goes through the network)?
class pyspark.ml.classification.MultilayerPerceptronClassifier(featuresCol='features', labelCol='label', predictionCol='prediction', maxIter=100, tol=1e-06, seed=None, layers=None, blockSize=128, stepSize=0.03, solver='l-bfgs', initialWeights=None, probabilityCol='probability', rawPredictionCol='rawPrediction')
Spark gradient optimizer works using RDD treeAggregate function. Each iteration it takes a fraction of the RDD, by default 1, and distributes the gradient optimization operation over the workers, it takes the whole RDD each iteration. In this case one iteration can be considered as one epoch. This approach simplifies the optimization process using Spark. There are another more advanced deep learning optimizer implementations, like BigDL, that allows to set the batch size and uses the BlockManager to compute the distributed gradient aggregation for each iteration. In that case, one iteration corresponds to one mini-batch execution.