Search code examples
pythonapache-sparkmachine-learningpysparkrecommendation-engine

Can the alpha parameter of the ALS.trainImplicit() be greater than 1?


I have been testing the example code at http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback with my own data in place.

When I put alpha greater than 1, as is suggested by the source paper at

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4781121

I get the error

Py4JError: An error occurred while calling o629.trainImplicitALSModel. Trace: py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean, null]) does not exist

Is the limit for alpha in PySpark <1?


Solution

  • PySpark doesn't enforce any limitations beyond these already enforced by Scala backend but the types matter. It means that:

    ALS.trainImplicit(ratings, rank, numIterations, alpha=100.0)
    

    is not the same as

    ALS.trainImplicit(ratings, rank, numIterations, alpha=100)
    

    with the latter one being invalid due to type mismatch. In other words types matter. Python float as a java.lang.Double, Python int is represented as a java.lang.Integer.