I'm getting the following error when training a logistic regression model using my dataset:
Caused by: java.lang.IllegalArgumentException: requirement failed: Index 0 follows 0 and is not strictly increasing
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.ml.linalg.SparseVector.$anonfun$new$5(Vectors.scala:629)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:246)
at org.apache.spark.ml.linalg.SparseVector.<init>(Vectors.scala:628)
at org.apache.spark.ml.linalg.VectorUDT.deserialize(VectorUDT.scala:64)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:168)
... 38 more
I'm not sure what this error indicates and where I should be debugging. Could someone familiar with Spark MLlib give me some guidance? Thanks in advance!
You're constructing a sparse vector, and the list of (index, value)
tuples contains a duplicate 0 index, eg:
Vectors.sparse(2, Seq((0, 1d), (0, 1d)))
Spark used to let this slip, but seemingly doesn't anymore since a recent release.
I had this exact same issue. It turns out to be a useful exception as it highlighted a bug where two of my model's features were using the same prefix in their values, hence the duplicate index.