I'm trying to extract the predicted probability from the logistic model using ML pipeline and DataFrame API. The output of predicted probabilities is a column vector that stores the predicted probabilities for each class(0, 1) in as shown below. I wonder how I can extract only the probability for class 1. Thank you!
prob
"[0.13293408418007766,0.8670659158199223]"
"[0.1335112097146626,0.8664887902853374]"
UDF like this should work:
import org.apache.spark.sql.functions.udf
val getPOne = udf((v: org.apache.spark.mllib.linalg.Vector) => v(1))
model.transform(testDf).select(getPOne($"probability"))