I have a matrix which I'm trying to normalize by transforming each feature column to zero mean and unit standard deviation.
I have the following code that I'm using, but I want to know if that method actually does what I'm trying to or if it uses a different method.
from sklearn import preprocessing
mat_normalized = preprocessing.normalize(mat_from_df)
sklearn.preprocessing.normalize
scales each sample vector to unit norm. (The default axis is 1, not 0.) Here's proof of that:
from sklearn.preprocessing import normalize
np.random.seed(444)
data = np.random.normal(loc=5, scale=2, size=(15, 2))
np.linalg.norm(normalize(data), axis=1)
# array([ 1., 1., 1., 1., 1., 1., ...
It sounds like you're looking for sklearn.preprocessing.scale
to scale each feature vector to ~N(0, 1).
from sklearn.preprocessing import scale
# Are the scaled column-wise means approx. 0.?
np.allclose(scale(data).mean(axis=0), 0.)
# True
# Are the scaled column-wise stdevs. approx. 1.?
np.allclose(scale(data).std(axis=0), 1.)
# True