Small question regarding a Spark exception I am getting please.
I have a very straightforward dataset:
myCoolDataset.show();
+----------+-----+
| time|value|
+----------+-----+
|1621900800| 43|
|1619568000| 41|
|1620432000| 41|
|1623974400| 42|
|1620604800| 41|
[truncated]
|1621296000| 42|
|1620691200| 44|
|1620345600| 41|
|1625702400| 44|
+----------+-----+
only showing top 20 rows
And I would like to perform a Linear Regression on it, in order to predict the next value for future time.
Therefore, I wrote the following, this is what I tried:
VectorAssembler vectorAssembler = new VectorAssembler().setInputCols(new String[]{"time", "value"}).setOutputCol("features");
Dataset<Row> vectorData = vectorAssembler.transform(myCoolDataset);
LinearRegression lr = new LinearRegression();
LinearRegressionModel lrModel = lr.fit(vectorData); // issue here
Unfortunately, at run time, I am getting this exception:
Exception in thread "main" java.lang.IllegalArgumentException: label does not exist. Available: time, value, features
at org.apache.spark.sql.types.StructType.$anonfun$apply$1(StructType.scala:278)
at scala.collection.immutable.Map$Map3.getOrElse(Map.scala:181)
at org.apache.spark.sql.types.StructType.apply(StructType.scala:277)
at org.apache.spark.ml.util.SchemaUtils$.checkNumericType(SchemaUtils.scala:75)
at org.apache.spark.ml.PredictorParams.validateAndTransformSchema(Predictor.scala:54)
at org.apache.spark.ml.PredictorParams.validateAndTransformSchema$(Predictor.scala:47)
at org.apache.spark.ml.regression.LinearRegression.org$apache$spark$ml$regression$LinearRegressionParams$$super$validateAndTransformSchema(LinearRegression.scala:185)
May I ask what is the root cause, and how to fix this please?
Thank you
Mllib regressions expect to be passed the name of the column containing the label (what you want to predict). By default, regressions will consider a column named 'label'. In your particular example, you don't have such column.
I see these solutions: