I have a simple model created with Keras and I need to measure the execution time for prediction per image. Right now I just do this:
start = time.clock()
my_model.predict(images_test)
end = time.clock()
print("Time per image: {} ".format((end-start)/len(images_test)))
But I noticed that the calculated time is bigger when len(images_test) is smaller. For example when len(images_test) = 32 I get: 0.06 and when len(images_test) = 1024 I get: 0.006
Is there a "right" way to do this ?
if use TF it seems no Asynchronous problem
but if use pytorch it has Asynchronous problem.
in TF:
start = time.clock()
result = my_model.predict(images_test)
end = time.clock()
in pytorch:
torch.cuda.synchronize()
start = time.clock()
my_model.predict(images_test)
torch.cuda.synchronize()
end = time.clock()
But i think you can do 10 times Loop model_predict
and print time_list
(computer need load keras model so first time load slower than other times )
in TF:
pred_time_list=[]
for i in range(10):
start = time.clock()
result = my_model.predict(images_test)
end = time.clock()
pred_time_list.append(end-start)
print(pred_time_list)
(print the pred_time_list and you may find out why the times incorrect)
Reference:
[1]
https://discuss.pytorch.org/t/doing-qr-decomposition-on-gpu-is-much-slower-than-on-cpu/21213/6
[2]
https://discuss.pytorch.org/t/is-there-any-code-torch-backends-cudnn-benchmark-torch-cuda-synchronize-similar-in-tensorflow/51484/2