machine-learning scikit-learn jupyter-notebook feature-engineering

dead kernel when doing feature engineering?

I am working on a prediction problem. In my training set, I have around 8,700 samples and around 1,000 features. I used different models but still, it is highly biased. So, I decided to add new features to the model. I added some lags to the features and then used the polynomial tools in sklearn to generate polynomial features (degree=2).

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
X_poly = poly.fit_transform(X)
X = pd.DataFrame(X_poly, columns=poly.get_feature_names_out(), index=X.index)

Now, I have around 490,000 features. Next, when I want to do the feature scaling,

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X)

I face an error in jupyternotebook saying "dead kernel" and I cannot go further.

What should I do? Any suggestion?

Solution

You need to do a batch processing with `partial fit` and then `transform` (also needs a loop):

scaler = StandardScaler()

n = X.shape[0]  # rows
batch_size = 1000  
i = 0 

while i < n:
    partial_size = min(batch_size, n - i)  
    partial_x = X[i:i + partial_size]
    scaler.partial_fit(partial_x)
    i += partial_size

dead kernel when doing feature engineering?

You need to do a batch processing with partial fit and then transform (also needs a loop):

You need to do a batch processing with `partial fit` and then `transform` (also needs a loop):