Search code examples
pythonfeature-selectionsupervised-learning

Use a different estimator based on value


What I'm trying to do is build a regressor based on a value in a feature. That is to say, I have some columns where one of them is more important (let's suppose it is gender) (of course it is different from the target value Y).

I want to say:
- If the gender is Male then use the randomForest regressor
- Else use another regressor

Do you have any idea about if this is possible using sklearn or any other library in python?


Solution

  • You might be able to implement your own regressor. Let us assume that gender is the first feature. Then you could do something like

    class MyRegressor():
        '''uses different regressors internally'''
        def __init__(self):
            self.randomForest = initializeRandomForest()
            self.kNN = initializekNN()
    
        def fit(self, X, y):
            '''calls the appropriate regressors'''
            X1 = X[X[:,0] == 1]
            y1 = y[X[:,0] == 1]
            X2 = X[X[:,0] != 1]
            y2 = y[X[:,0] != 1]
            self.randomForest.fit(X1, y1)
            self.kNN.fit(X2, y2)
    
        def predict(self, X):
            '''predicts values using regressors internally'''
            results = np.zeros(X.shape[0])
            results[X[:,0]==1] = self.randomForest.predict(X[X[:,0] == 1])
            results[X[:,0]!=1] = self.kNN.predict(X[X[:,0] != 1])
    
            return results