I am using Keras, so the shape of data is (batch_size, timesteps, input_dim). And Standard Scaler just fits 2D data.
One solution I thought was using partial fit and then transform.
scaler = StandardScaler()
for sample in range(data.shape[0]):
scaler.partial_fit(data[sample])
for sample in range(data.shape[0]):
data[sample] = scaler.transform(data[sample])
Is this a correct/efficient approach?
You have two possibilities
data = np.random.randn(batch_size*time_length*nb_feats).reshape((bsize,time,feats))
Version 1 is doing what you say:
scaler = StandardScaler()
for sample in range(data.shape[0]):
scaler.partial_fit(data[sample])
for sample in range(data.shape[0]):
data[sample] = scaler.transform(data[sample])
Another possibility (Version 2) is to flatten the array, fit and transform and then reshape it
scaler = StandardScaler()
data = scaler.fit_transform(data.reshape((bsize*time,feats))).reshape((bsize,time,feats))
In my computer
Version 1 takes 0.8759770393371582 seconds
Version 2 takes 0.11733722686767578 seconds