I have been testing the example code at http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback with my own data in place.
When I put alpha greater than 1, as is suggested by the source paper at
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4781121
I get the error
Py4JError: An error occurred while calling o629.trainImplicitALSModel. Trace: py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean, null]) does not exist
Is the limit for alpha in PySpark <1?
PySpark doesn't enforce any limitations beyond these already enforced by Scala backend but the types matter. It means that:
ALS.trainImplicit(ratings, rank, numIterations, alpha=100.0)
is not the same as
ALS.trainImplicit(ratings, rank, numIterations, alpha=100)
with the latter one being invalid due to type mismatch. In other words types matter. Python float
as a java.lang.Double
, Python int
is represented as a java.lang.Integer
.