machine-learning scikit-learn scaling data-preprocessing

Is there a way to set the data_min and the data_max in MinMaxScaler()?

I'm currently using MinMaxScaler() on my dataset. However, because my dataset is large I'm doing a first iteration pass in batches to compute the Min and Max Values for my Scaler. i'm using partial_fit() to help with this.

Anyway, for some of my features I do know their min and max values. Is there anyway I can explicity inform the scaler about these min and max values?

Solution

You could simply create your own function to transform your data:

def myMinMaxScaler(X, Xmin, Xmax):
    return (X - Xmin) / (Xmax - Xmin)

Another option could be to add rows (with the samples containing the min and max) at the end of your batches and after the transformation remove the added rows.

Is there a way to set the data_min and the data_max in MinMaxScaler()?
Do I need to use scaler even if my dataframe has fairly normalized data within a specific range
What are the differences between fine tuning and few shot learning?
Ordinal classification packages and algorithms
ALS (Alternating Least Square) algorithm in multiple rankings for a user
Keras, memoryerror - data = data.astype("float") / 255.0. Unable to allocate 309. MiB for an array with shape (13165, 32, 32, 3)
Random Forest is overfitting
How do cross_val_score and gridsearchCV work?
what is difference between criterion and scoring in GridSearchCV
Is a neural network a lazy or eager learning method?
What is sharding in machine learning and how to do sharding in Tensorflow?
Dataset for bank transaction
How should a training dataset be distributed?
How does Hydra `_partial_` interact with seeding
StratifiedKFold vs KFold in scikit-learn
Training a Keras model to identify leap years
Ideas for Extracting Blade Tip Coordinates from masked Wind Turbine Image
Macro VS Micro VS Weighted VS Samples F1 Score
Doing PyWavelets calculation on GPU
Training loss increases instead of decrease with epochs
cannot access free variable 'fig' where it is not associated with a value in enclosing scope
How to save a Dataset in multiple shards using `tf.data.Dataset.save`
why explain logit as 'unscaled log probabililty' in sotfmax_cross_entropy_with_logits?
What is the loss function used in Trainer from the Transformers library of Hugging Face?
Sampling from image data
Using features extracted using a pretrained CNN as new features for an CNN/NN
InvalidArgumentError: No DNN in stream executor while training a TensorFlow RetinaNet model on Google Colab
How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?
How to create image of confusion matrix in Python
Cross-validation with nb method