I have highly imbalanced data so for binary classification I convert probabilities for 1-class with threshold = 0.06.
I want to show probabilities to management so I need to adjust then on condition that 0.06 is my new 50% boundary.
So I want my low probability, like 0.045, 0.067, 0.01
values to be recalculated to be higher percentage.
I guess I should multuply it, but I don't know how to find the value.
data for reference
id probability
_____________________
168835 0.529622
168836 0.870282
168837 0.988074
180922 0.457827
78352 0.272279
...
320739 0.003046
329237 0.692332
329238 0.926343
329239 0.994264
320741 0.002714
Not sure if it is any useful after a year, but what you have to do is apply inverse function, to get back the x values, move everything left and reapply your probability function to get back the probabilities. Multiplying won't work, unless you are using linear function, which I'm guessing is not the case.
Assuming you use a standard logistic regression your code for recalculating probabilities should look something like this
import numpy as np
import pandas as pd
df = pd.DataFrame({"probability_old":[0.529622,0.870282,0.988074,0.457827,0.272279,0.003046,0.692332,0.926343,0.994264,0.002714,0.06,0.5]})
def sig(z):
return 1/(1+np.exp(-z))
def inv_sig(z):
return np.log(z/(1-z))
y_0 = 0.06
# inv_sig(y_0) ≈ -2.75
df["probability_new"] = sig(inv_sig(df["probability_old"]) - inv_sig(y_0))
Results:
id | probability_old | probability_new |
---|---|---|
0 | 0.529622 | 0.946352 |
1 | 0.870282 | 0.990576 |
2 | 0.988074 | 0.999230 |
3 | 0.457827 | 0.929723 |
4 | 0.272279 | 0.854264 |
5 | 0.003046 | 0.045680 |
6 | 0.692332 | 0.972417 |
7 | 0.926343 | 0.994950 |
8 | 0.994264 | 0.999632 |
9 | 0.002714 | 0.040892 |
10 | 0.060000 | 0.500000 |
11 | 0.500000 | 0.940000 |
Hopefully this image will clarify the logic behind the code