The trtexec --help
binary shows that --warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200)
.
However, why is a warmup needed? If the model (and thus intermediate buffers necessary for the forward pass) are allocated during model load time, then the only performance bottleneck would be the Host to Device Memory transfers. The nvidia docs indicate that this is corrected for by their enqueing strategy.
Therefore I'm not sure what else could result in an initial performance bottleneck. Any insight on why this is needed would be much appreciated.
TensorRT needs warmup for multiple reasons:
nvidia-smi
to see the current mode.+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 528.49 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:65:00.0 On | N/A |
| 0% 47C P8 36W / 350W | 473MiB / 12288MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+