I am currently doing some analysis on the Housing Prices in Ames Iowa dataset.
I have successfully wrangled the data and removed all missing values etc and I'm about to do some regression analysis. I want to build three regression models the first with the two best features, the second with 15 features and the third with all available variables. I am using SelectKBest to do the feature selection. My target variable is the 'SalePrice' which I log-transformed.
I always get a value-error from SelectKBest for some reason.
Interestingly, if I do not log-transform the 'SalePrice', everything works fine.
I checked the dtype of my target variable and ist as expected a float.
Could Somebody help me out?
I would really appreciate it!
You are using
SelectKBest(chi2)
According to the documentation of chi2
:
Parameters:
X : {array-like, sparse matrix}, shape = (n_samples, n_features_in) Sample vectors. y : array-like, shape = (n_samples,) Target vector (class labels).
chi2
only works with classification tasks, not regression. Your current problem of predicting sales price is a regression task and hence the error.
Maybe try f_regression
in place of chi2