Search code examples
kerasloss-functionloss

Custom loss function for sports betting


I'm trying to make a model that predicts the most profitable tennis matches to bet on. For that reason I included betting odds into yTrue variable so it may look like this:

yTrue = np.array([[0,1,1.38,3.05],[1,0,1.05,7.85],[1,0,1.04,10.24],[1,0,1.22,3.71],[1,0,1.06,6.69],[1,0,1.48,2.46],[0,1,1.61,2.22]])

yTrue[0] and yTrue[1]: favorite/underdog won/lost

yTrue[2] and yTrue[3]: favorite/underdog's odds

Let's say the bet is 1000 units so in our dataset: the first and last game have been won by the underdog, the other games were won by the favorite. If we made a bet of 1000 units on all the 7 matches and all of them were good, we would make a profit of 4120.0 units, so this is the maximum achievable profit. In case the predictions were like this: yPred = np.array([[0,1], [0,1],[1,0],[1,0],[0,1],[1,0],[1,0]]) 4 out of 7 predictions were correct so in total it would make a loss of -210.0 units.

In normal Python the loss function could look like this:

import numpy as np
yTrue = np.array([[0,1,1.38,3.05],[1,0,1.05,10.24],[1,0,1.04,7.85],[1,0,1.22,3.71],[1,0,1.06,6.69],[1,0,1.48,2.46],[0,1,1.61,2.22]])
yPred = np.array([[0,1], [0,1],[1,0],[1,0],[0,1],[1,0],[1,0]])

def myLoss(yTrue, yPred):
        maxPotentialProfit = 0
        realProfit = 0
        betUnits = 1000
        for a in range(0,len(yTrue)):
                bet = yTrue[a]
                pred = yPred[a]
                favOdds = bet[2]-1
                dogOdds = bet[3]-1
                if bet[0] == 1:
                        maxPotentialProfit += favOdds*betUnits
                else:
                        maxPotentialProfit += dogOdds*betUnits

                if bet[0] == 1 and pred[0] == 1:
                        realProfit += favOdds*betUnits
                elif bet[1] == 1 and pred[1] == 1:
                        realProfit += dogOdds*betUnits
                else:
                        realProfit -=betUnits

        return maxPotentialProfit - realProfit

But now the real deal: how can I make it differentiable so it works with keras? :)


Solution

    1. A model's prediction will never be on either extremes 0 or 1. It will be something in between; the closer the prediction is to the extreme means the more confident the model is.

    2. It's wasteful to store both favorite/underdog won/lost, because won + loss = 1, you just need to store one variable and deduce the other.

    So your data should look like this:

    # 1 = favorite wins, 0 = underdog wins / favorite odd / underdog odd
    yTrue = np.array([[0, 1.38, 3.05], [1, 1.05, 10.24],[1, 1.04, 7.85]])
    
    # the predicted probability that favorite wins
    yPred = np.array([0.7, 0.8, 0.9])
    
    1. Unless you want betUnits to be part of the model's prediction, there's no need to get it involved in the loss function, because it would not affect the training.

    2. The native Keras way to calculate maxPotentialProfit is

    import keras.backend as K
    
    # split yTrue into 3 columns for easy handling
    result = yTrue[:, 0]
    favOdds = yTrue[:, 1]
    dogOdds = yTrue[:, 2]
    
    # logic: check each element in result, 
    # if 1, add the corresponding element in favOdds,
    # else, add the corresponding element in dogOdds.
    maxPotentialProfit = K.sum(K.switch(result==1, favOdds, dogOdds))
    
    1. Since the prediction will never be either 0 or 1, a conditional like if bet[0] == 1 and pred[0] == 1 won't work. Instead let's interpret an output of, say 0.7, as the model betting 0.7 units on the favorite and 0.3 units on the underdog.
    # split the prediction
    favPred = yPred
    dogPred = 1 - yPred
    
    # logic: check each element in result,
    # if 1, add the corresponding element of favOdds*favPred,
    # else, add the corresponding element of dogOdds*dogPred
    realProfit = K.sum(K.switch(result==1, favOdds*favPred, dogOdds*dogPred))
    
    # initial cost of betting: 1 for each bet
    realProfit -= yPred.shape[0] # this is keras' equivalent of len()
    # However, since this value is independent of the values yPred,
    # it will have no effects on the training whatsoever, so we can remove it.
    
    return maxPotentialProfit - realProfit
    
    1. You may notice that the calculation for maxPotentialProfit and realProfit are very similar. So instead of calculating them individually, we can do combine them in one line.

    Here's the full code:

    import keras.backend as K
    
    def myLoss(yTrue, yPred):
        result = yTrue[:, 0]
        favOdds = yTrue[:, 1]
        dogOdds = yTrue[:, 2]
        favPred = yPred
        dogPred = 1 - yPred
        return K.sum(K.switch(result==1, favOdds*(1-favPred), dogOdds*(1-dogPred)))
    
    1. That's all, just a note: you can't call myLoss on some numpy arrays to test, because it expects Keras tensors as inputs. You need to build a Keras model and use myLoss as the loss function to see it working. But that is beyond the scope of this question.