swift deep-learning pytorch nsnumber torchscript

Different results after converting pytorch to torchscript? Converting NSnumber to Float cause any loss?

I converted the pytorch pretrained-model(.pt) to torchscript model(.pt) for using it in Swift 5(ios-iphone6s, xcode 11). In Swift, the “predict” function of the model gave me its embedding values(Tensor). Since it returned the NSNumber array as a result of prediction, I used type casting [NSNumber] to both [Double] or [Float] to calculate the distance between two embedding values. L2 normalization, dot product, etc.

However, while the pytorch version got the correct answers, the torchscript model got so many wrong answers. Not only the answers are different, the distance calculations of the two embedding pairs in torchscript are also different from the results of the pytorch model on the PC(CPU, Pycharm). In fact, before using type casting for distance calculations, the embedding values in NSNumber(Swift) are so different from the values in float32(pytorch). I used same input images.

I tried to find the reason.. Once, I copied the embedding values( [NSNumber] ) from swift-torchscript and calculated the distance between two embeddings in pytorch, to check if there was a problem with my distance calculation implementation in Swift. I used torch.FloatTensor to use type casting [NSNumber] -> [Float]. I also tried [Double]. As a result of this, I found many infinite numbers. Is this infinite numbers related to the wrong answer?

What does this “inf” mean? Is it a calculation or type casting error? Did I lost information while casting from NSNumber to Float or Double? How could I get the correct value from the torchscript model in swift? What should I check?

I used the following codes to convert. pytorch -> torchscript.

import torch

from models.inception_resnet_v1 import InceptionResnetV1

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

example = torch.rand(1, 3, 160, 160)
traced_script_module = torch.jit.trace(resnet, example)
traced_script_module.save("mobile_model.pt")

Solution

Are you using InceptionResnetV1 from: https://github.com/timesler/facenet-pytorch ? When you are referring to the pytorch model in your comparison of the outputs, are you referring to the torchscript model when run in pytorch, or the resnet as is?

If it is the latter, did you already check something similar as below?

What do you get when running the following:

print('Original:')
orig_res = resnet(example)
print(orig_res.shape)
print(orig_res[0, 0:10])
print('min abs value:{}'.format(torch.min(torch.abs(orig_res))))
print('Torchscript:')
ts_res = traced_script_module(example)
print(ts_res.shape)
print(ts_res[0, 0:10])
print('min abs value:{}'.format(torch.min(torch.abs(ts_res))))
print('Dif sum:')
abs_diff = torch.abs(orig_res-ts_res)
print(torch.sum(abs_diff))
print('max dif:{}'.format(torch.max(abs_diff)))

after defining 'traced_script_module'. I get the following:

Original:
torch.Size([1, 512])
tensor([ 0.0347,  0.0145, -0.0124,  0.0723, -0.0102,  0.0653, -0.0574,  0.0004,
        -0.0686,  0.0695], device='cuda:0', grad_fn=<SliceBackward>)
min abs value:0.00034740756382234395
Torchscript:
torch.Size([1, 512])
tensor([ 0.0347,  0.0145, -0.0124,  0.0723, -0.0102,  0.0653, -0.0574,  0.0004,
        -0.0686,  0.0695], device='cuda:0', grad_fn=<SliceBackward>)
min abs value:0.0003474018594715744
Dif sum:
tensor(8.1539e-06, device='cuda:0', grad_fn=<SumBackward0>)
max dif:5.960464477539063e-08

which is not perfect but considering the outputs are in the order of 10^-4 minimum, and that the before last number is the sum of the absolute difference of 512 elements, not the mean, it seems not too far off for me. The maximum difference is at around 10^-8.

By the way, you might want to change to:

example = torch.rand(1, 3, 160, 160).to(device)

If you get something similar for the tests above, what are the kind of values you get for the first 10 output values you get from the swift-torchscript as NSNumber, and then, once casted in float, when compared against both the same slices in the pytorch and torchscript-pytorch model outputs?