I'm working on a fraud detection system, and I would like to optimize the system to take account of the cost (in $) of the reviewing department:
I would like to adjust the loss depending on a final cost:
The metric should be the sum of:
The metric should look like that:
def fraudmetric(ytrue, ypred, fraudulentamt, reviewcost):
cost = [0 if yt==0 and yp==0 else ## TN
reviewcost if yt==1 and yp==1 else ## TP
reviewcost if yt==0 and yp==1 else ## FP
fa if yt==1 and yp==0 else 0 ## FN
for yt, yp, fa in zip(ytrue, ypred, fraudulentamt, reviewcost)]
return np.sum(cost)
Is there an elegant way to do that with Python
?
Thanks
You can easily implement a binary table like this using... well, a table. It'd look like this:
metric_table = [[0, reviewcost],
[fa, reviewcost]]
metric_value = metric_table[yt][yp] # for a given yt, yp
I've taken the liberty here to fix what seems to be a bug in your code, since you zipped four iterables and only got three values out. I assume you want the element from reviewcost
rather than the whole thing. Fix it if that's not correct. Also, I don't see the need to create a temporary array, just to sum it all up, so I've collapsed it to the summation of a generator:
def fraudmetric(ytrue, ypred, fraudulentamt, reviewcost):
return sum([[ 0, rc],
[fa, rc]][yt][yp]
for yt, yp, fa, rc in zip(ytrue, ypred, fraudulentamt, reviewcost))