python feature-selection supervised-learning

Use a different estimator based on value

What I'm trying to do is build a regressor based on a value in a feature. That is to say, I have some columns where one of them is more important (let's suppose it is gender) (of course it is different from the target value Y).

I want to say:
- If the gender is Male then use the randomForest regressor
- Else use another regressor

Do you have any idea about if this is possible using sklearn or any other library in python?

Solution

You might be able to implement your own regressor. Let us assume that gender is the first feature. Then you could do something like

class MyRegressor():
    '''uses different regressors internally'''
    def __init__(self):
        self.randomForest = initializeRandomForest()
        self.kNN = initializekNN()

    def fit(self, X, y):
        '''calls the appropriate regressors'''
        X1 = X[X[:,0] == 1]
        y1 = y[X[:,0] == 1]
        X2 = X[X[:,0] != 1]
        y2 = y[X[:,0] != 1]
        self.randomForest.fit(X1, y1)
        self.kNN.fit(X2, y2)

    def predict(self, X):
        '''predicts values using regressors internally'''
        results = np.zeros(X.shape[0])
        results[X[:,0]==1] = self.randomForest.predict(X[X[:,0] == 1])
        results[X[:,0]!=1] = self.kNN.predict(X[X[:,0] != 1])

        return results