I trained a Spark ML model, scored my holdout dataset with it, and now need to look up the prediction for specific entities.
How can I figure out which prediction is for whom? Is there a way I can add the entity primary key (e.g. Member_ID) to my prediction output?
More specifically: to score the dataset, I used:
predictions = trained_model.transform(holdout_data)
It produces a dataframe with columns: "features", "label", "prediction" (label is the response variable)
How do I find out the corresponding Member_ID for each prediction?
Does holdout_data
only contain the columns: ["features", "label"]
? If so then add the Member_ID
to it.
The .transform()
method of the pyspark.ml
model adds the extra column prediction
to the holdout_data
, so if Member_ID
is there to begin with, then problem solved.