Search code examples
apache-sparkpysparkapache-spark-mllibapache-spark-ml

Regression in PySpark. Which library to Use


What are the differences between "pyspark.mllib.regression" and "pyspark.ml.regression"

Which one should be used


Solution

  • That depends on the version of your spark. ml is suggested officially.

    As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package.

    Hope this will help !