Search code examples
machine-learningdatasetnormalizationscaling

how to scale one input example?


I have a dataset of such input features: [81.2819,5636.209677,9957.279495] above are three input features to my neural network. Let's say the size of my entire dataset is: (10,000 x 3) when I scale the entire dataset using the following lines of codes:

scaler = preprocessing.MinMaxScaler()
scaled_ds = scaler.fit_transform(dataset)

everything works fine. but when I scale just one row like the one above I get zeros like:

array([[0., 0., 0.]])

can any of you explain why?


Solution

  • According to the documentation:

    X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
    X_scaled = X_std * (max - min) + min
    

    so for a single sample, X.min coincides with X.max, causing your range to be zero. Zero-division handling, in turn, causes your X_scaled to be zero.

    This should explain why feature-wise scaling cannot be defined for a single data sample. On the other hand, if you have already fit your dataset and just want to transform a new example, you need to use:

    scaled_sample = scaler.transform(sample)
    

    i.e. just use the pre-obtained min/max values rather than attempt to fit new ones.