Search code examples
javapythonapache-sparkmachine-learningrecommendation-engine

recommendProductsForUsers not working in Spark 1.5.0


Given the following:

from pyspark import SparkContext, SparkConf
from pyspark.mllib.recommendation import ALS, Rating
r1 = (1, 1, 1.0)
r2 = (1, 2, 2.0)
r3 = (2, 1, 2.0)
ratings = sc.parallelize([r1, r2, r3])
model = ALS.trainImplicit(ratings, 1, seed=10)

res = model.recommendProductsForUsers(2)

I'd like to compute the top k products for every user. In general, users and products could be many and it would be too expensive to create an RDD to use with recommendProducts.

According to Spark version 1.5.0 recommendProductsForUsers should do the job. However, I am getting the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-c65e6875ea5b> in <module>()
      7 model = ALS.trainImplicit(ratings, 1, seed=10)
      8 
----> 9 res = model.recommendProductsForUsers(2)

AttributeError: 'MatrixFactorizationModel' object has no attribute 'recommendProductsForUsers'

And, in fact, recommendProductsForUsers does not appear when listing the methods of matrixFactorizationModel:

print dir(model)
['__class__', '__del__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_java_loader_class', '_java_model', '_load_java', '_sc', 'call', 'load', 'predict', 'predictAll', 'productFeatures', 'rank', 'recommendProducts', 'recommendUsers', 'save', 'userFeatures']

Solution

  • You're looking at the wrong documentation. A simple fact that some operation is implemented in a Scala or Java API it doesn't mean it is exposed to PySpark. If you check PySpark 1.5 API docs you'll see it doesn't provide requested method.

    recommendUsersForProducts and recommendProductsForUsers have been introduced in PySpark 1.6 with SPARK-10535.