I am implementing simple gradient descent algorithm using tensors. It learns two parameters m and c.
The normal python code for it is :
for i in range(epochs):
Y_pred = m*X + c # The current predicted value of Y
D_m = (-2/n) * sum(X * (Y - Y_pred)) # Derivative wrt m
D_c = (-2/n) * sum(Y - Y_pred) # Derivative wrt c
m = m - L * D_m # Update m
c = c - L * D_c # Update c
print (m, c)
output for python :
0.7424335285442664 0.014629895049575754
1.1126970531591416 0.021962519495058154
1.2973530613155333 0.025655870599552183
1.3894434413955663 0.027534253868790198
1.4353697670010162 0.028507481513901086
Tensorflow equivalent code :
#Graph of gradient descent
y_pred = m*x + c
d_m = (-2/n) * tf.reduce_sum(x*(y-y_pred))
d_c = (-2/n) * tf.reduce_sum(y-y_pred)
upm = tf.assign(m, m - learning_rate * d_m)
upc = tf.assign(c, c - learning_rate * d_c)
#starting session
sess = tf.Session()
#Training for epochs
for i in range(epochs):
sess.run(y_pred)
sess.run(d_m)
sess.run(d_c)
sess.run(upm)
sess.run(upc)
w = sess.run(m)
b = sess.run(c)
print(w,b)
Output for tensorflow :
0.7424335285442664 0.007335550424492317
1.1127687194584988 0.011031122807663662
1.2974962163433057 0.012911024540805463
1.3896400798226038 0.013885244876397126
1.4356019721347115 0.014407698787092268
The parameter m has the same value for both but parameter c has different value for both although the implementation is same for both.
The output contains first 5 values of parameter m and c. The output of parameter c using tensors is approximately half of the normal python.
I don't know where my mistake is.
For recreating the entire output: Repo containing data along with both implementations
The repo also contains image of graph obtained through tensorboard in events directory
The problem is that, in the TF implementation, the updates are not being performed atomically. In other words, the implementation of the algorithm is updating m
and c
in an interleaved manner (e.g. the new value of m
is being used when updating c
). To make the updates atomic, you should simultaneously run upm
and upc
:
sess.run([upm, upc])