Search code examples
deterministicnon-deterministictensorrtnvidia-jetsonhalf-precision-float

Is TensorRT "floating-point 16" precision mode non-deterministic on Jetson TX2?


I'm using TensorRT FP16 precision mode to optimize my deep learning model. And I use this optimised model on Jetson TX2. While testing the model, I have observed that TensorRT inference engine is not deterministic. In other words, my optimized model gives different FPS values between 40 and 120 FPS for same input images.

I started to think that the source of the non-determinism is floating point operations when I see this comment about CUDA:

"If your code uses floating-point atomics, results may differ from run to run because floating-point operations are generally not associative, and the order in which data enters a computation (e.g. a sum) is non-deterministic when atomics are used."

Is type of precision such as FP16, FP32 and INT8 affects determinism of TensorRT? Or anything?

Do you have any thoughs?

Best regards.


Solution

  • I solved the problem by changing the function clock() that I used for measuring latencies. The clock() function was measuring the CPU time latency, but what I want to do is to measure real time latency. Now I am using std::chrono to measure the latencies. Now inference results are latency-deterministic.

    That was wrong one, (clock())

    int main ()
    {
      clock_t t;
      int f;
      t = clock();
      inferenceEngine(); // Tahmin yapılıyor
      t = clock() - t;
      printf ("It took me %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
      return 0;
    }
    

    Use Cuda Events like this, (CudaEvent)

    cudaEvent_t start, stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
    
    cudaEventRecord(start);
    inferenceEngine(); // Do the inference
    
    cudaEventRecord(stop);
    
    cudaEventSynchronize(stop);
    float milliseconds = 0;
    
    cudaEventElapsedTime(&milliseconds, start, stop);
    

    Use chrono like this: (std::chrono)

    #include <iostream>
    #include <chrono>
    #include <ctime>
    int main()
    {
      auto start = std::chrono::system_clock::now();
      inferenceEngine(); // Do the inference
      auto end = std::chrono::system_clock::now();
    
      std::chrono::duration<double> elapsed_seconds = end-start;
      std::time_t end_time = std::chrono::system_clock::to_time_t(end);
    
      std::cout << "finished computation at " << std::ctime(&end_time)
                << "elapsed time: " << elapsed_seconds.count() << "s\n";
    }