python machine-learning scikit-learn artificial-intelligence loss-function

How do I implement a custom error function in a sklearn classifier?

I want to treat errors from overestimates and underestimates differently (like The Price is Right) during model training. I don't want to rewrite the entire MLP, regression, DecisionTree, etc. algorithms in sklearn just to implement my custom cost function and relevant derivative. Is there a way for me to define a function any classifier can use to override the default? This is an example of what I'm looking for:

def myCustomError(y_preds,y_actuals):

    #Calculate The price is right style error

    return #not MSE


from sklearn import #Classifier


c = #Classifier(loss=myCustomError)

If I can't do this in sklearn but I have to use tensorflow or some other libraries, please let me know.

Solution

In cases where you'd need to define a custom loss function, a neural-net framework would typically be used rather than sklearn. One usually can't supply a custom optimisation function to sklearn algorithms. If you wanted to stick with sklearn, some algorithms allow you to configure sample importance or class balancing, but from your question it doesn't seem like the desired solution.

I don't want to rewrite the entire MLP, regression, DecisionTree, etc. algorithms in sklearn just to implement my custom cost function and relevant derivative.

Not sure about decision trees, but MLPs and regression are straightforward to implement in PyTorch. Also, when you define a custom loss function, the derivative is taken care of for you. Here's a simple regression model using a custom loss function that penalises the overestimates more strongly than the underestimates:

Some mock data for this example (2D feature space, and target is a scalar):

#Test data
import numpy as np

np.random.seed(0)
X0 = np.random.randn(128, 2) + 5
X1 = np.random.randn(128, 2)
X = np.concatenate([X0, X1], axis=0)
y = np.concatenate([np.linspace(0, 3, 128), np.linspace(-10, -5, 128)]).reshape(-1, 1)

Simple regression net with a custom loss function in PyTorch:

import torch
from torch import nn
from torch import optim

#Example of a custom loss function
#Treats overestimates differently to underestimates
def custom_loss(predictions, target):
        errors = predictions - target
        overestimates = errors[errors > 0]
        underestimates = errors[errors < 0]

        #penalise the square error of the overestimates more
        loss = (overestimates ** 2).sum() + (0.5 * underestimates ** 2).sum()
        return loss / len(target)

#Define a simple regression neural net
torch.manual_seed(0)
model = nn.Sequential(
    nn.Linear(2, 4),
    nn.ReLU(),
    nn.Linear(4, 4),
    nn.ReLU(),
    nn.Linear(4, 1)
)

#Data to tensors
X_tensor = torch.tensor(X).to(torch.float32)
y_tensor = torch.tensor(y).to(torch.float32)

#Choose an optimiser and start training
optimiser = torch.optim.RMSprop(model.parameters())
n_epochs = 5

model.train()
for epoch in range(n_epochs):    
    predictions = model(X_tensor)
    loss = custom_loss(predictions, y_tensor)
    print('epoch', epoch, loss)
    
    #Backpropagation and optimisation step
    optimiser.zero_grad()
    loss.backward()
    optimiser.step()

For brevity this example leaves out details like scaling and batching the data (and keeping a validation set).