What is actually ICudaEngine.serialize() call generating?

I want to figure out what actually is that TensorRT guys name "engine". I want to know this because I am not sure if I will be able to use the same engine to infer on top of different GPUs real architectures.

I know that there is a sort of code that execute the neural network inference step. I want to figure out if it contains cuda PTX code (a sort of bytecode interpreted by the CUDA JIT) or maybe it is an actual binary file compiled for a given GPU architecture.

I expect it to be a sort of portable bytecode. Do you have any clue?

Thanks a lot!

Solution

I want to know this because I am not sure if I will be able to use the same engine to infer on top of different GPUs real architectures

TensorRT models created are optimized according to the GPU architecture they are built upon. So, engine built on one GPU architecture should not be used on a different architecture.