I am working on a problem where I need to predict one of several output classes from movement sensor data using an LSTM. There are two different sensors, each with three channels and having different units of measurement. For each recording, I am using min-max normalization to bring the amplitude sensor levels in between 0 and 1 (for each sensor individually).
Doing this, I see that when using normalization, my network does converge faster to a final value (in terms of accuracy) but the performance is significantly lower compared to when using non-normalized data for the same network setting.
From what I understand, normalization has the advantage of helping in training but if one gains a performance advantage, is it really necessary to stick with using a lower performing network that takes in normalized inputs. I am not that experienced and would like other people to comment on this.
Thanks!
I would use the better performing network if that is the main priority. Goal of normalization is generally just so your loss doesn't explode during training. So it often will improve results when values are very large. However, sometimes when the values are already small normalization will make it worse. It is also possible that your range of values is too small. You might want to try to normalize between (0, 2) or a larger range. But if performance is already satisfactory without normalization I wouldn't bother.