Search code examples
machine-learningscikit-learnscalingdata-preprocessing

Is there a way to set the data_min and the data_max in MinMaxScaler()?


I'm currently using MinMaxScaler() on my dataset. However, because my dataset is large I'm doing a first iteration pass in batches to compute the Min and Max Values for my Scaler. i'm using partial_fit() to help with this.

Anyway, for some of my features I do know their min and max values. Is there anyway I can explicity inform the scaler about these min and max values?


Solution

  • You could simply create your own function to transform your data:

    def myMinMaxScaler(X, Xmin, Xmax):
        return (X - Xmin) / (Xmax - Xmin)
    

    Another option could be to add rows (with the samples containing the min and max) at the end of your batches and after the transformation remove the added rows.