Search code examples
pysparkapache-spark-ml

PySpark TypeError: object of type 'ParamGridBuilder' has no len()


I am trying to tune my model on Databricks using Pyspark.

I receive the following error: TypeError: object of type 'ParamGridBuilder' has no len()

My code has been listed below.

from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator



als = ALS(userCol = "userId",itemCol="movieId", ratingCol="rating",  coldStartStrategy="drop", nonnegative = True, implicitPrefs = False)

# Imports ParamGridBuilder package
from pyspark.ml.tuning import ParamGridBuilder 

# Creates a ParamGridBuilder, and adds hyperparameters
param_grid = ParamGridBuilder().addGrid(als.rank, [5,10,20,40]).addGrid(als.maxIter, [5,10,15,20]).addGrid(als.regParam,[0.01,0.001,0.0001,0.02]) 

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")

# Imports CrossValidator package
from pyspark.ml.tuning import CrossValidator 

# Creates cross validator and tells Spark what to use when training and evaluates
cv = CrossValidator(estimator = als,
                    estimatorParamMaps = param_grid,
                    evaluator = evaluator,
                    numFolds = 5) 

model = cv.fit(training) 

TypeError: object of type 'ParamGridBuilder' has no len()

Full Error Log:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-1952169986445972> in <module>()
----> 1 model = cv.fit(training)
      2 
      3 # Extract best combination of values from cross validation
      4 
      5 best_model = model.bestModel

/databricks/spark/python/pyspark/ml/base.py in fit(self, dataset, params)
    130                 return self.copy(params)._fit(dataset)
    131             else:
--> 132                 return self._fit(dataset)
    133         else:
    134             raise ValueError("Params must be either a param map or a list/tuple of param maps, "

/databricks/spark/python/pyspark/ml/tuning.py in _fit(self, dataset)
    279         est = self.getOrDefault(self.estimator)
    280         epm = self.getOrDefault(self.estimatorParamMaps)
--> 281         numModels = len(epm)

Solution

  • It simple means that your object does not have a length property (unlike lists). Thus, In your line

    param_grid = ParamGridBuilder()
        .addGrid(als.rank, [5,10,20,40])
        .addGrid(als.maxIter, [5,10,15,20])
        .addGrid(als.regParam, [0.01,0.001,0.0001,0.02])
    

    You should add .build() in the end to actually construct a grid.