python apache-spark machine-learning pyspark recommendation-engine

Can the alpha parameter of the ALS.trainImplicit() be greater than 1?

I have been testing the example code at http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback with my own data in place.

When I put alpha greater than 1, as is suggested by the source paper at

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4781121

I get the error

Py4JError: An error occurred while calling o629.trainImplicitALSModel. Trace: py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean, null]) does not exist

Is the limit for alpha in PySpark <1?

Solution

PySpark doesn't enforce any limitations beyond these already enforced by Scala backend but the types matter. It means that:

ALS.trainImplicit(ratings, rank, numIterations, alpha=100.0)

is not the same as

ALS.trainImplicit(ratings, rank, numIterations, alpha=100)

with the latter one being invalid due to type mismatch. In other words types matter. Python float as a java.lang.Double, Python int is represented as a java.lang.Integer.