Search code examples
rapache-sparkmachine-learningapache-spark-mllibsparkr

SparkR Error in UseMethod("predict")


Following the ALS example here

...but running in distributed mode, e.g.

Sys.setenv("SPARKR_SUBMIT_ARGS"="--master yarn sparkr-shell")
spark <- sparkR.session(master = "yarn",
                    sparkConfig = list(
                      spark.driver.memory = "2g",
                      spark.driver.extraJavaOptions =
                        paste("-Dhive.metastore.uris=",
                              Sys.getenv("HIVE_METASTORE_URIS"),
                              " -Dspark.executor.instances=",
                              Sys.getenv("SPARK_EXECUTORS"),
                              " -Dspark.executor.cores=",
                              Sys.getenv("SPARK_CORES"),
                              sep = "")
                    ))


ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0),list(2, 1, 1.0), list(2, 2, 5.0))
df <- createDataFrame(ratings, c("user", "item", "rating"))
model <- spark.als(df, "rating", "user", "item")
stats <- summary(model)
userFactors <- stats$userFactors
itemFactors <- stats$itemFactors
# make predictions
summary(model)
predicted <- predict(object=model, data=df)

I get the following error:

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "ALSModel"

Looking at the source for 2.1.1 the method seems to exist, and the summary() function that is defined directly above it works just fine.

I have tried with with Spark, 2.1.0, 2.1.1, and 2.2.0-rc6, all of which give the same result. Also, this is not limited to the ALS model, calling predict() for any model gives the same error.

I also get the same error when I run it in local mode, e.g.

spark <- sparkR.session("local[*]")

Has anybody come across this problem before?


Solution

  • Although I have not reproduced exactly your error (I get a different one), most probably the problem is in the second argument of your predict call, which should be newData, and not data (see the documentation).

    Here is an adaptation of your code for Spark 2.2.0 run locally from RStudio:

    library(SparkR, lib.loc = "/home/ctsats/spark-2.2.0-bin-hadoop2.7/R/lib") # change the path accordingly here
    
    sparkR.session(sparkHome = "/home/ctsats/spark-2.2.0-bin-hadoop2.7")      # and here
    
    ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0),list(2, 1, 1.0), list(2, 2, 5.0))
    df <- createDataFrame(ratings, c("user", "item", "rating"))
    model <- spark.als(df, "rating", "user", "item")
    stats <- summary(model)
    userFactors <- stats$userFactors
    itemFactors <- stats$itemFactors
    # make predictions
    summary(model)
    predicted <- predict(object=model, newData=df)  # newData here
    showDF(predicted)
    # +----+----+------+----------+
    # |user|item|rating|prediction|
    # +----+----+------+----------+
    # | 1.0| 1.0|   3.0|  2.810426|
    # | 2.0| 1.0|   1.0| 1.0784092|
    # | 0.0| 1.0|   2.0|  1.997412|
    # | 1.0| 2.0|   4.0| 3.9731808|
    # | 2.0| 2.0|   5.0| 4.8602753|
    # | 0.0| 0.0|   4.0| 3.8844662|
    # +----+----+------+----------+
    

    A simple predict(model, df) will also work.