I am trying to convert a fairly complex model from pytorch into ONNX. The conversion succeeds without error, but I am encountering this error when loading the model:
Traceback (most recent call last):
File "/home/***/***/***.py", line 50, in <module>
main()
File "/home/***/***/***.py", line 38, in main
ort_session = ort.InferenceSession(onnx_path, providers=[
File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 324, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 369, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for RandomNormalLike(1) node with name 'RandomNormalLike_598'
I think that the RandomNormalLike
node the error is complaining about might correspond to this module I have:
class NoiseInjection(nn.Module):
def __init__(self):
super().__init__()
self.weight = nn.Parameter(torch.zeros(1), requires_grad=True)
def forward(
self,
feat: torch.Tensor,
noise: Optional[torch.Tensor] = None,
) -> torch.Tensor:
if noise is None:
batch, _, height, width = feat.shape
noise = torch.randn(
batch, 1, height, width,
dtype=feat.dtype,
device=feat.device,
)
return feat + self.weight * noise
I also created a different implementation, but it leads to the same error: (edit: This version actually works. I made an unrelated mistake elsewhere that mislead me into thinking it did not work)
def forward(
self,
feat: torch.Tensor,
noise: Optional[torch.Tensor] = None,
) -> torch.Tensor:
if noise is None:
noise = torch.randn_like(feat[:, 0:1])
return feat + self.weight * noise
My pytorch and onnx version are as follows:
$ conda list torch
# Name Version Build Channel
torch 1.10.0+cu113 pypi_0 pypi
torchaudio 0.10.0+cu113 pypi_0 pypi
torchvision 0.11.1+cu113 pypi_0 pypi
$ conda list onnx
# Name Version Build Channel
onnx 1.10.2 pypi_0 pypi
onnxruntime-gpu 1.9.0 pypi_0 pypi
What can be done to be able to export such a module to onnx and run it successfully?
From checking online I found a similar issue on GitHub about conv (https://github.com/microsoft/onnxruntime/issues/3130), could be that the types of the parameters used in torch are not compatible with the implementation of RandomNormalLike available in ONNX.
Could you check in netron what's inside the RandomNormalLike node/nodes to see if they comply with the spec: https://github.com/onnx/onnx/blob/main/docs/Operators.md#RandomNormal or https://github.com/onnx/onnx/blob/main/docs/Operators.md#RandomNormalLike
Cheers
EDIT: turns out the RandomNormal node has a type of 10 which corresponds to fp16
While the onnxruntime implementation only supports float and doubles see source code here: https://github.com/microsoft/onnxruntime/blob/24e35fba3217bf33b0e4064bc71d271a61938ba0/onnxruntime/core/providers/cpu/generator/random.cc#L354
Solution here is either to run the whole model in fp32 or ask explicitely RandomNormalLike to use floats or doubles hoping that torch allows mixed computation on fp16 and fp32/fp64 I guess