Do you apply min max scaling separately on training and test data?

While applying min max scaling to normalize your features, do you apply min max scaling on the entire dataset before splitting it into training, validation and test data?

Or do you split first and then apply min max on each set, using the min and max values from that specific set?

Lastly , when making a prediction on a new input, should the features of that input be normalized using the min, max values from the training data before being fed into the network?

Solution

Split it, then scale. Imagine it this way: you have no idea what real-world data looks like, so you couldn't scale the training data to it. Your test data is the surrogate for real-world data, so you should treat it the same way.

To reiterate: Split, scale your training data, then use the scaling from your training data on the testing data.

How to make a multifactor model in pROC?
Sagemaker batch transformer with my own pre-trained model
Can we import a python made ML model (.pkl) in rust?
How to use OpenCV to do OCR and text detect and recognition
Realworld parameter optimization
What do the coefficients on correlated variables mean?
Handling Class Imbalance in Multi-class Classification with Custom Loss Function
Struggling to understand complete predictive model process in R
How to allocate GPUs on AWS Free Tier?
Open Source Neural Network Library
How to make FeatureUnion return Dataframe
What is the role of "Flatten" in Keras?
Machine learning model predicts training labels themselves as result
split an audio file into chunks, skip the chunks less than desired time duration, and predict emotion for the entire audio file
Facing ValueError: Target is multiclass but average='binary'
Random forest is worse than linear regression. Is it normal and what is the reason?
Detectron2 - Extract region features at a threshold for object detection
Detectron2 Checkpoint not found
Incomprehensible shape error with one of the inputs of my non-sequential keras model
How to process requests from multiiple users using ML model and FastAPI?
Alternative to device_map = "auto" in Huggingface Pretrained
np.where: "ValueError: operands could not be broadcast together with shapes (38658637,) (9456,)"
How to compute number of weights of CNN?
How to find the connected instances from a minimum spanning trees model in R
Can a neural network be trained while it changes in size?
Keras-rl2 error Compability with Tensorflow
Separate a ingredients/feature into separate columns that is marked with "0" or "1"
How to conditionally assign values to tensor [masking for loss function]?
Uniformity of color and texture in image
ClassifierChain with Random Forest: Why is np.nan not supported even though Base Estimator handles it?