Search code examples
python-3.xsvmscalefeature-extractionscikit-image

Scaling multiple features with StandardScaler, before or after concatenation?


I have an image data set (with pixel values from 0 to 255), from which I want to extract different features, e.g. HOG features, Gabor filter feature, LBP and color histogram. I would like to concatenate these features into a single feature vector

feature_overall = np.concatenate((feat1, feat2, feat3, feat4), axis=1)

and then train an SVM with this resulting overall feature vector.

I'm using Python and Scikit-Image (Skimage).

I am not sure, where I have to use the standard scaler here? For each feature separately, i.e. before all features are concatenated? Or is the standard scaler applied to the concatenated feature vector, i.e. to the resulting overall feature vector?

Many thanks for every help


Solution

  • The StandardScaler scales each column to have mean 0 and standard deviation 1. In that sense, it does not matter if you scale the features before or after concatenation.

    However, if you were using sklern.preprocessing.Normalizer() then it would matter. Normalizer() makes each row have the same magnitude in some metric (e.g. euclidean).

    If so, I'd use the Normalizer() before concatenating the features, because you may want the sum of HOG features to be constant across all images, but you probably don't want the sum HOG features, Gabor filter feature, LBP and color histogram to be constant.