Search code examples
scalaapache-sparkcross-validationxgboostapache-spark-ml

Setting the scalePosWeight parameter for the Spark xgBoost model in a CV grid


I am trying to tune my xgBoost model on Spark using Scala. My XGb parameter grid is as follows:

val xgbParamGrid = (new ParamGridBuilder()
                .addGrid(xgb.maxDepth, Array(8, 16))
                .addGrid(xgb.minChildWeight, Array(0.5, 1, 2))
                .addGrid(xgb.alpha, Array(0.8, 0.9, 1))
                .addGrid(xgb.lambda, Array(0.8, 1, 2))
                .addGrid(xgb.scalePosWeight, Array(1, 5, 9))
                .addGrid(xgb.subSample, Array(0.5, 0.8, 1))
                .addGrid(xgb.eta, Array(0.01, 0.1, 0.3, 0.5))
                .build())

The call to the cross validator is as follows:

val evaluator = (new BinaryClassificationEvaluator()
                      .setLabelCol("label")
                      .setRawPredictionCol("prediction")
                      .setMetricName("areaUnderPR"))

    val cv = (new CrossValidator()
              .setEstimator(pipeline_model_xgb)
              .setEvaluator(evaluator)
              .setEstimatorParamMaps(xgbParamGrid)
              .setNumFolds(10))

    val xgb_model = cv.fit(train)

I am getting the following error just for the scalePosWeight parameter:

error: type mismatch;
found   : org.apache.spark.ml.param.DoubleParam
required: org.apache.spark.ml.param.Param[AnyVal]
Note: Double <: AnyVal (and org.apache.spark.ml.param.DoubleParam <:                      

    org.apache.spark.ml.param.Param[Double]), but class Param is invariant in type T.
You may wish to define T as +T instead. (SLS 4.5)
                              .addGrid(xgb.scalePosWeight, Array(1, 5, 9))
                                           ^

Based on my search, the message "You may wish to define T as +T instead" is common but I am not sure how to fix this here. Thanks for your help!


Solution

  • I run into the same issue when setting the Array for minChildWeight and the array was composed by Int types only. The solution that worked (for both scalePosWeight and minChildWeight) is to pass an Array of Floats:

    .addGrid(xgb.scalePosWeight, Array(1.0, 5.0, 9.0))