I am trying to use ALS, but currently my data is limited to information about what user bought. So I was trying to fill ALS from Apache Spark with Ratings equal 1 (one) when user X bought item Y (and only such information I provided to that algorithm).
I was trying to learn it (divided data to train/test/validation) or was trying just to learn on all data but at the end I was getting prediction with extremely similar values for any pair user-item (values differentiated on 5th or 6th place after comma like 0,86001 and 0,86002).
I was thinking about that and maybe it is because I can provide only rating equal 1 so does ALS cannot be used in such extreme situation?
Is there any trick with ratings so I could use to fix such problem (I have only information's about what was bought - later I am going to get more data, but at a moment I have to use some kind of collaborative filtering until I will acquire more data - in other words I need to show user some kind of recommendation on startup page I choose ALS for startup page but maybe I use something else, what exactly)?
Ofcourse I was changing parameters like iterations, lambda, rank.
In this case, the key is that you must use trainImplicit
, which ignores Rating
's value. Otherwise you're asking it to predict ratings in a world where everyone rates everything 1. The right answer is invariably 1, so all your answers are similar.