Search code examples
scalaapache-sparkneural-networkapache-spark-mllibapache-spark-ml

SparkML MultilayerPerceptron error: java.lang.ArrayIndexOutOfBoundsException


I have the following model that I would like to estimate using SparkML MultilayerPerceptronClassifier().

val formula = new RFormula()
  .setFormula("vtplus15predict~ vhisttplus15 + vhistt + vt + vtminus15 + Time + Length + Day")
  .setFeaturesCol("features")
  .setLabelCol("label")

formula.fit(data).transform(data)

Note: The features is a vector and label is a Double

root
 |-- features: vector (nullable = true)
 |-- label: double (nullable = false)

I define my MLP estimator as follows:

val layers = Array[Int](6, 5, 8, 1) //I suspect this is where it went wrong

val mlp = new MultilayerPerceptronClassifier()
  .setLayers(layers)
  .setBlockSize(128)
  .setSeed(1234L)
  .setMaxIter(100)

// train the model
val model = mlp.fit(train)

Unfortunately, I got the following error:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 11 at org.apache.spark.ml.classification.LabelConverter$.encodeLabeledPoint(MultilayerPerceptronClassifier.scala:121) at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:245) at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:245) at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:935) at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:950) ...


Solution

  • The solution is to first find the local optimal that allows one to escape the ArrayIndexOutBound and then use brute-force search to find the global optimal. Shaido suggest finding n

    For example, val layers = Array[Int](6, 5, 8, n). This assumes the length of the feature vectors are 6. – Shaido

    So make n be a large integer(n =100) then manually use brute-force search to arrive at a good solution(n =50 then try n =32 - error, n = 35 - perfect).

    Credit to Shaido.