I noticed there are two LinearRegressionModel
classes in SparkML, one in ML package (spark.ml
) and another one in MLLib
(spark.mllib
) package.
These two are implemented quite differently - e.g. the one from MLLib
implements Serializable
, while the other one does not.
By the way, the same is true about RandomForestModel
or Word2Vec
.
Why are there two classes? Which is the "right" one? And is there a way to convert one into another?
o.a.s.mllib
contains old RDD-based API while o.a.s.ml
contains new API build around Dataset
and ML Pipelines. ml
and mllib
reached feature parity in 2.0.0 and mllib
is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.
So unless your goal is backward compatibility then the "right choice" is o.a.s.ml
.