Initialize the list of lists:
data = [[1.0, 0.635165,0.0], [1.0, 0.766586,1.0], [1.0, 0.724564,1.0],
[1.0, 0.766586,1.0],[1.0, 0.889199,1.0],[1.0, 0.966586,1.0],
[1.0, 0.535165,0.0],[1.0, 0.55165,0.0],[1.0, 0.525165,0.0],
[1.0, 0.5595165,0.0] ]
Create the Pandas DataFrame:
df = pd.DataFrame(data, columns = ['y', 'prob','y_predict'])
Print data frame.
print(df)
For this data-set, I want to find:
How to do this in python?
You can calculate the false positive rate and true positive rate associated to different threshold levels as follows:
import numpy as np
def roc_curve(y_true, y_prob, thresholds):
fpr = []
tpr = []
for threshold in thresholds:
y_pred = np.where(y_prob >= threshold, 1, 0)
fp = np.sum((y_pred == 1) & (y_true == 0))
tp = np.sum((y_pred == 1) & (y_true == 1))
fn = np.sum((y_pred == 0) & (y_true == 1))
tn = np.sum((y_pred == 0) & (y_true == 0))
fpr.append(fp / (fp + tn))
tpr.append(tp / (tp + fn))
return [fpr, tpr]