python pandas dataframe scikit-learn sklearn-pandas

How can I convert the StandardScaler() transformation back to dataframe?

I'm working with a model, and after splitting into train and test, I want to apply StandardScaler(). However, this transformation converts my data into an array and I want to keep the format I had before. How can I do this?

Basically, I have:

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = df[features]
y = df[["target"]]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.7, random_state=42
)

sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)

How can I get X_train_sc back to the format that X_train had?

Update: I don't want to get X_train_sc to reverse back to before being scaled. I just want X_train_sc to be a dataframe in the easiest possible way.

Solution

As you mentioned, applying the scaling results in a numpy array, to get a dataframe you can initialize a new one:

import pandas as pd

cols = X_train.columns
sc = StandardScaler()
X_train_sc = pd.DataFrame(sc.fit_transform(X_train), columns=cols)
X_test_sc = pd.DataFrame(sc.transform(X_test), columns=cols)

2022 Update

As of scikit-learn version 1.2.0, it is possible to use the set_output API to configure transformers to output pandas DataFrames (check the doc example)

The above example would simplify as follows:

import pandas as pd

cols = X_train.columns
sc = StandardScaler().set_output(transform="pandas")
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)