I am using Spark (core and Mllib) version 2.2.0 with Scala.
I successfully saved a CrossValidator model with Logistic Regression. Below is the code that I used
val cv = new CrossValidator()
.setEstimator(lr)
.setEvaluator(new BinaryClassificationEvaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(5)
val model = cv.fit(trainingData)
model.write.overwrite().save("./cvmodel")
After that, I'm trying to use it for another dataset with the code below
val model = CrossValidatorModel.read.load("./cvmodel")
val cleanData = DataApi.cleanData(dataset, spark) // custom method
val preparedData = DataApi.oneHotEncodingData(cleanData).select("label","features") // custom method
val predict_dataset = model.transform(preparedData)
printResult(predict_dataset) // A custom method that uses metrics to print the statistics
// of the result
However, when using datasets of different sizes compared to the test data (whether more or less), I get this error thrown
java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes: x.size = 1178, y.size = 9921
The code is actually working with a dataset of the same size. Therefore, I would like to know if it is possible to use the saved model with another dataset of different size without the need to fit it again. If so, I would like to know how.
Thank you for your help.
I actually found the cause of this error. During my one hot enconding process, I was actually using some pipelines that I didn't save like my CrossValidatorModel. What I had to do was :
When doing this, I no longer had issues.