Search code examples
pythonscikit-learnfeature-selectionsklearn-pandasvalueerror

SelectKBest ValueError after Log-Transformation of Target-Variable


I am currently doing some analysis on the Housing Prices in Ames Iowa dataset. I have successfully wrangled the data and removed all missing values etc and I'm about to do some regression analysis. I want to build three regression models the first with the two best features, the second with 15 features and the third with all available variables. I am using SelectKBest to do the feature selection. My target variable is the 'SalePrice' which I log-transformed. I always get a value-error from SelectKBest for some reason. Interestingly, if I do not log-transform the 'SalePrice', everything works fine. I checked the dtype of my target variable and ist as expected a float.
Could Somebody help me out?
I would really appreciate it! enter image description here


Solution

  • You are using

    SelectKBest(chi2)
    

    According to the documentation of chi2:

    Parameters:

    X : {array-like, sparse matrix}, shape = (n_samples, n_features_in)
        Sample vectors.
    
    y : array-like, shape = (n_samples,)
        Target vector (class labels).
    

    chi2 only works with classification tasks, not regression. Your current problem of predicting sales price is a regression task and hence the error.

    Maybe try f_regression in place of chi2