Search code examples
machine-learningapache-spark-mllibrecommendation-enginedata-science

In Spark: MatrixFactorizationModel.scala “recommendProductsForUsers” function takes very long time to complete


I have 9 nodes cluster and each node has the following configurations,

enter image description here

enter image description here

I’m trying to generate recommendations for all the users in MatrixFactorizationModel using 'recommendProductsForUsers' function. Looks like it takes very long time to complete (eg: For 1 month of data it takes approximately around 34 hours). Is it due to the iteration for multiple times over the matrix?

How can I reduce the execution time?

These are my spark-submit configuration:

spark-submit --jars $JAR_LOC --class com.collaborativefiltering.CustomerCollaborativeJob --driver-memory 5G --num-executors 7 --executor-cores 2 --executor-memory 20G --master yarn-client cust_rec/cust-rec.jar --period 1month --out /PATH --rank 50 --numIterations 2 --lambda 0.25 --alpha 300 --topK 20

Thank you very much in advance.


Solution

  • I found in MatrixFactorizationModel the recommendProductsForUsers runs through multiple iteration so the computational time is high. Once I started to run my jobs in cloud, I tested the job by increasing the nodes and spark-executors. It actually worked! I was able to run and complete the job within 4 hours.