I found a couple of post where users are wondering why they are receiving NaN
values in their predictions when using ALS. I ran into the same problem and seemingly found the answer and an implemented solution, with detailed discussion in the docs:
Note: there was a working link here to documentation on coldStartStrategy()
however seemingly due to my question the documentation was removed.
Which I thought would solve the problem. Except even after updating to Spark 2.1.1 (wasn't working on 2.1.0) I am continuing to receive the same error:
TypeError: init() got an unexpected keyword argument 'coldStartStrategy'
Here is where I attempt to use the argument:
full_train, full_test = ugr_df.randomSplit([0.7, 0.3], seed=0L)
als = ALS(rank = rank, maxIter = maxIter, regParam = lmbda,
userCol = "user_id", itemCol="game_id", seed = seed,
ratingCol="rating", coldStartStrategy="drop")
optimized_model = als.fit(full_train)
I am importing ALS in this way:
from pyspark.ml.recommendation import ALS
My code works fine when I take out the cold start argument. From what I can see in the docs, I am implementing it correctly.
If I am going to go without it can I safely do the following for the same effect? i.e. is the following code synonymous with the coldStartStrategy
argument?
predictions = optimized_model.transform(full_test)
predictions_drop = predictions.dropna()
Then go on to use the predictions_drop
df for regression analysis.
coldStartStrategy
has been introduced with SPARK-14489 in Spark 2.2, which hasn't been released yet:
If you want to use it you have to build Spark from source or use developer builds.
Calling na.drop
should have the same same effect as using drop
strategy, which internally it is implemented as:
case ALSModel.Drop =>
predictions.na.drop("all", Seq($(predictionCol)))