swift keras deep-learning computer-vision coreml

How to process non-square frames from camera with CoreML?

I have trained a neural network in Keras to detect keypoints on an image. The network expects images of shape (224, 224, 3). I would like to be able to detect keypoints on images in Swift using CoreML but am unsure how to use non-square shapes with my neural network that expects square images. Any idea on proper pre-processing steps? Is there something built into the Vision API to help with this? I could just squish the images to squares but I imagine that would mess with the predicted (X,y) keypoint pairs.

Solution

It depends. How was the original model trained? If it was also trained on squashed images, then squashing them during inference is fine.

If not, and you want to preserve the aspect ratio of the images, you may want to set the imageCropAndScaleOption in your VNCoreMLRequest object to one of the other modes.

Regardless, you will have to convert the predicted keypoint coordinates back to whatever size you're displaying the image in. This is easiest when the image is squashed (just multiply by the width and height) but a little trickier when using one of the other imageCropAndScaleOptions.