I'm using TensorRT FP16 precision mode to optimize my deep learning model. And I use this optimised model on Jetson TX2. While testing the model, I have observed that TensorRT inference engine is not deterministic. In other words, my optimized model gives different FPS values between 40 and 120 FPS for same input images.
I started to think that the source of the non-determinism is floating point operations when I see this comment about CUDA:
"If your code uses floating-point atomics, results may differ from run to run because floating-point operations are generally not associative, and the order in which data enters a computation (e.g. a sum) is non-deterministic when atomics are used."
Is type of precision such as FP16, FP32 and INT8 affects determinism of TensorRT? Or anything?
Do you have any thoughs?
Best regards.
I solved the problem by changing the function clock() that I used for measuring latencies. The clock() function was measuring the CPU time latency, but what I want to do is to measure real time latency. Now I am using std::chrono to measure the latencies. Now inference results are latency-deterministic.
That was wrong one, (clock())
int main ()
{
clock_t t;
int f;
t = clock();
inferenceEngine(); // Tahmin yapılıyor
t = clock() - t;
printf ("It took me %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
return 0;
}
Use Cuda Events like this, (CudaEvent)
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start);
inferenceEngine(); // Do the inference
cudaEventRecord(stop);
cudaEventSynchronize(stop);
float milliseconds = 0;
cudaEventElapsedTime(&milliseconds, start, stop);
Use chrono like this: (std::chrono)
#include <iostream>
#include <chrono>
#include <ctime>
int main()
{
auto start = std::chrono::system_clock::now();
inferenceEngine(); // Do the inference
auto end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::time_t end_time = std::chrono::system_clock::to_time_t(end);
std::cout << "finished computation at " << std::ctime(&end_time)
<< "elapsed time: " << elapsed_seconds.count() << "s\n";
}