Search code examples
apache-sparkmachine-learningpysparkapache-spark-mlliblogistic-regression

How to print the probability of prediction in LogisticRegressionWithLBFGS for pyspark


I am using Spark 1.5.1 and, In pyspark, after I fit the model using:

model = LogisticRegressionWithLBFGS.train(parsedData)

I can print the prediction using:

model.predict(p.features)

Is there a function to print the probability score also along with the prediction?


Solution

  • You have to clear the threshold first, and this works only for binary classification:

     from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
     from pyspark.mllib.regression import LabeledPoint
    
     parsed_data = [LabeledPoint(0.0, [4.6,3.6,1.0,0.2]),
                    LabeledPoint(0.0, [5.7,4.4,1.5,0.4]),
                    LabeledPoint(1.0, [6.7,3.1,4.4,1.4]),
                    LabeledPoint(0.0, [4.8,3.4,1.6,0.2]),
                    LabeledPoint(1.0, [4.4,3.2,1.3,0.2])]   
    
     model = LogisticRegressionWithLBFGS.train(sc.parallelize(parsed_data))
     model.threshold
     # 0.5
     model.predict(parsed_data[2].features)
     # 1
    
     model.clearThreshold()
     model.predict(parsed_data[2].features)
     # 0.9873840020002339