Search code examples
pythonscikit-learnclassificationloss

Use a $ amount as loss to measure classification performance


I'm working on a fraud detection system, and I would like to optimize the system to take account of the cost (in $) of the reviewing department:

I would like to adjust the loss depending on a final cost:

  • If a transaction is fraudulent, but the amount is small, it may be more expensive to spend time on a review.
  • A non-fraudulent transaction that is reviewed still has a cost.
  • Some transaction can be really costly and must be caught

The metric should be the sum of:

  • TP --> No cost
  • FP --> review cost
  • TN --> review cost + the amount of money we got back from the fraud (if it's not the totality)
  • FN --> Total amount of the fraudulent transaction

The metric should look like that:

def fraudmetric(ytrue, ypred, fraudulentamt, reviewcost):
  cost = [0 if yt==0 and yp==0 else          ## TN
          reviewcost if yt==1 and yp==1 else ## TP
          reviewcost if yt==0 and yp==1 else ## FP
          fa if yt==1 and yp==0 else 0       ## FN
          for yt, yp, fa in zip(ytrue, ypred, fraudulentamt, reviewcost)]
  return np.sum(cost)

Is there an elegant way to do that with Python ?

Thanks


Solution

  • You can easily implement a binary table like this using... well, a table. It'd look like this:

    metric_table = [[0, reviewcost],
                    [fa, reviewcost]]
    metric_value = metric_table[yt][yp]  # for a given yt, yp
    

    I've taken the liberty here to fix what seems to be a bug in your code, since you zipped four iterables and only got three values out. I assume you want the element from reviewcost rather than the whole thing. Fix it if that's not correct. Also, I don't see the need to create a temporary array, just to sum it all up, so I've collapsed it to the summation of a generator:

    def fraudmetric(ytrue, ypred, fraudulentamt, reviewcost):
        return sum([[ 0, rc],
                    [fa, rc]][yt][yp]
                   for yt, yp, fa, rc in zip(ytrue, ypred, fraudulentamt, reviewcost))