Search code examples
pythonscikit-learndata-preprocessing

Why does the func parameter of TransformedTargetRegressor need to return 2-dimensional array and not 1-dimensional?


In the documentation of TransformedTargetRegressor, it is mentioned that the parameter func needs to return a 2-dimensional array. Should it not be a 1-dimensional array instead? The target y mostly has the shape (n_samples,) which is 1-dimensional.

The below code, where target y and the output of func is 1-dimensional, runs properly -

exponentiate = lambda x: np.exp(x)
naturalLog = lambda x: np.log(x)
loglinreg = compose.TransformedTargetRegressor(regressor=linear_model.LinearRegression(),func=naturalLog,inverse_func=exponentiate)
loglinreg.fit(X_train,yCO_train)
loglinreg.score(X_train,yCO_train)

Solution

  • In the source, func is applied using a FunctionTransformer, which requires 2-dimensional input. This also aligns with the other option, setting a transformer object directly, which generally expect 2-dimensional input.

    See also the Note in the docs:

    Internally, the target y is always converted into a 2-dimensional array to be used by scikit-learn transformers. At the time of prediction, the output will be reshaped to a have the same number of dimensions as y.

    In your example, it runs because np.log and np.exp are shape-agnostic; during the fitting, those two functions are actually being called on 2-dimensional arrays. You can check this by defining your own func:

    def mylog(y):
        return np.log(y).ravel()
    

    Using that, we get the expected Expected 2D array, got 1D array instead error.