In the context of some neural network research I'm evaluating several approaches on how to implement these or what library to use. Currently I'm comparing Tensorflow and Theano and I'm struggling with getting TenorFlow to perform well. Here is my simple Hello-Gradient-Benchmark, it just optimizes a scalar multiplication with one coefficient.
import time
class Timer:
def __init__(self, what):
self.what = what
def __enter__(self):
self.t1 = time.time()
return self
def __exit__(self,t,v,tb):
t2 = time.time()
print("{0} runs {1:.4f} seconds".format(self.what, t2-self.t1))
def run_tensorflow():
import tensorflow as tf
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a = tf.Variable([1.], tf.float32)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
loss = (y-a*x)**2
step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
def one_step():
sess.run(step, {x:1.,y:0.})
with Timer('tensorflow') as t:
result = [ one_step() for n in range(1000) ]
def run_theano():
import theano as th
x = th.tensor.dscalar()
y = th.tensor.dscalar()
a = th.tensor.dscalar()
l = a*x
loss = (y-l)**2
dloss = th.tensor.grad(loss, a)
dloss_f = th.function([x,y,a], dloss)
a = [1.]
def one_step():
a[0] -= 0.01 * dloss_f(1.,0.,a[0])
with Timer('theano') as t:
result = [ one_step() for n in range(1000) ]
run_tensorflow()
run_theano()
I'm running this program on the CPU with the packages installed via pip
. Running times are 0.36 and 0.043 seconds for TensorFlow and Theano, respectively. I see similar performance differences for real networks where the matrix-multiplication overhead should dominate, still TensorFlow is significantly slower.
I want to know if I'm using Tensorflow wrongly for what I'm trying to do. Should I not call the run()
method within a loop?
TF and Theano are designed for handling large objects, on the order of 1M elements. Benchmarking their handling of scalars is not particularly relevant.
This is an apples-to-oranges comparison: With TF, you are timing both the compilation and the run time, while in Theano, you are only timing the run time! This is because when you call theano.function
, it does all the compilation then. OTOH in TF, much of this work is shifted to when you first call sess.run
.
That said, there are also realistic scenarios when TF is slower than Theano.