Pytorch to ONNX: Could not find an implementation for RandomNormalLike

I am trying to convert a fairly complex model from pytorch into ONNX. The conversion succeeds without error, but I am encountering this error when loading the model:

Traceback (most recent call last):
  File "/home/***/***/***.py", line 50, in <module>
    main()
  File "/home/***/***/***.py", line 38, in main
    ort_session = ort.InferenceSession(onnx_path, providers=[
  File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 324, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 369, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for RandomNormalLike(1) node with name 'RandomNormalLike_598'

I think that the RandomNormalLike node the error is complaining about might correspond to this module I have:

class NoiseInjection(nn.Module):
    def __init__(self):
        super().__init__()

        self.weight = nn.Parameter(torch.zeros(1), requires_grad=True)

    def forward(
        self,
        feat: torch.Tensor,
        noise: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
        if noise is None:
            batch, _, height, width = feat.shape
            noise = torch.randn(
                batch, 1, height, width,
                dtype=feat.dtype,
                device=feat.device,
            )

        return feat + self.weight * noise

I also created a different implementation, but it leads to the same error: (edit: This version actually works. I made an unrelated mistake elsewhere that mislead me into thinking it did not work)

    def forward(
        self,
        feat: torch.Tensor,
        noise: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
        if noise is None:
            noise = torch.randn_like(feat[:, 0:1])

        return feat + self.weight * noise

My pytorch and onnx version are as follows:

$ conda list torch
# Name                    Version                   Build  Channel
torch                     1.10.0+cu113             pypi_0    pypi
torchaudio                0.10.0+cu113             pypi_0    pypi
torchvision               0.11.1+cu113             pypi_0    pypi

$ conda list onnx
# Name                    Version                   Build  Channel
onnx                      1.10.2                   pypi_0    pypi
onnxruntime-gpu           1.9.0                    pypi_0    pypi

What can be done to be able to export such a module to onnx and run it successfully?

Solution

From checking online I found a similar issue on GitHub about conv (https://github.com/microsoft/onnxruntime/issues/3130), could be that the types of the parameters used in torch are not compatible with the implementation of RandomNormalLike available in ONNX.

Could you check in netron what's inside the RandomNormalLike node/nodes to see if they comply with the spec: https://github.com/onnx/onnx/blob/main/docs/Operators.md#RandomNormal or https://github.com/onnx/onnx/blob/main/docs/Operators.md#RandomNormalLike

Cheers

EDIT: turns out the RandomNormal node has a type of 10 which corresponds to fp16

While the onnxruntime implementation only supports float and doubles see source code here: https://github.com/microsoft/onnxruntime/blob/24e35fba3217bf33b0e4064bc71d271a61938ba0/onnxruntime/core/providers/cpu/generator/random.cc#L354

Solution here is either to run the whole model in fp32 or ask explicitely RandomNormalLike to use floats or doubles hoping that torch allows mixed computation on fp16 and fp32/fp64 I guess