Search code examples
tensorflownormalization

Tensorflow: Normalization inside or outside graph?


I have a time series dataset, which i want to normalize before running training using RNN. I scan the data and create a tfrecord file. Then, training uses this tfrecord file.

My question is, how would I go about deciding whether to normalize the data before and create the tfrecord file with normalized data or to create the tfrecord with raw data and normalize the data during training, as they are read from tfrecord file?

Right now I normalize the data before and create the tfrecord file with normalized data. I did it this way because i was thinking that normalizing during training will increase computation time as RNN will constantly read examples from tfrecord file and will need to normalize over and over the same examples.

What considerations would affect my decision to go one way or the other?


Solution

  • There is no general rule for this. As you exactly say, it all depends on which setup performs better.

    If you pre-normalize the data, the network will run that much faster, but you need to store all that extra data. On the other hand, if you change the training data, you will have to run that calculation again. Also, normalization is basically one subtraction (of the mean) and a division (by the standard deviation), provided you pre-calculate the mean and standard deviation. So it's not a terribly time-consuming operation.

    So I'd suggest do it both ways, run your network for 2 minutes each way and see how big the performance difference is.

    One more thought, if you use batch normalization on the input, then it's already normalizing it, and this exercise is unnecessary.