What is the role of normalization function in TensorFlow?

I learn machine learning and tried to build a simple tensorflow model. And when I tried to train the model my loss number was about 10.

5s 83us/step - loss: 9.6847 - acc: 0.3971

Code of the model:

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer=tf.train.AdamOptimizer(),
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

model.fit(x_train, y_train, epochs=3)

But then I normalized the dataset using this code

x_train = keras.utils.normalize(x_train, axis=1)

And then loss fell to less than 1.

And the question is what does it do to make such a huge impact?

Solution

"And the question is what does it do to make such a huge impact?" It normalizes the training data to the l2 norm of the data Implementation. This is done so that no specific sample dominates how the updates are done to the weights. See the answer to this question as well. In this answer, the reason for normalization is explained using a two feature logistic regression example.