Search code examples
deep-learningdata-scienceartificial-intelligencenormalizationquantitative-finance

Deep Learning Data Normalization


I’m working with different types of financial data inputs for my models and I would like to know more about normalization of them.

In particular, working with some technical indicators, I’ve normalized them to have a range between 0 and 1.

Others were normalized to have a range between -1 and 1.

What is your experience with mixed normalized data?

Could it be acceptable to have these two ranges or is it always better to have the training dataset with a single range i.e. [0 1]?


Solution

  • It is important to note that when we discuss data normalization, we are usually referring to the normalization of continuous data. Categorical data (usually) doesn't require the former.

    Furthermore, not all ML methods need you to normalize data for them to function well. Examples of such methods include Random Forests and Gradient Boosting Machines. Others, however, do. For instance, Support Vector Machines and Neural Networks.

    The reasons for input data normalization are dependent on the methods themselves. For SVMs, data normalization is done to ensure that input features are given equal importance in influencing the model's decisions. For neural networks, we normalize data to allow the gradient descent process to converge smoothly.

    Finally, to answer your question, if you are working with continuous data and using a neural network to model your data, just make sure that the normalized data's values are close to each other (even if they are not the same range) because that is what determines the ease with which the gradient descent process converges. If you are working with an SVM, it would be better to normalize your data to a single range, so that all features may be given equal importance by the similarity/ distance function that your SVM uses. In other cases, the need for data normalization, whatever the ranges, may be removed entirely. Ultimately, it depends on the modeling technique you are using!

    Credit to @user3666197 for the helpful feedback in the comments.