IPhone X true depth image analysis and CoreML

I understant that my question is not directly related to programming itself and looks more like research. But probably someone can advise here.

I have an idea for app, when user takes a photo and app will analyze it and cut everythig except required object (a piece of clothin for example) and will save it in a separate image. Yesterday it was very difficult task, because developer should create pretty good neural network and educate it. But after Apple released iPhone X with true depth camera, half of the problems can be solved. As per my understanding, developer can remove background much more easily, because iPhone will know where background is located.

So only several questions left:

I. What is the format of photos which are taken by iPhone X with true depth camera? Is it possible to create neural network that will be able to use information about depth from the picture?

II. I've read about CoreML, tried some examples, but it's still not clear for me - how the following behaviour can be achieved in terms of External Neural Network that was imported into CoreML:

Neural network gets an image as an input data.
NN analyzes it, finds required object on the image.
NN returns not only determinated type of object, but cropped object itself or array of coordinates/pixels of the area that should be cropped.
Application gets all required information from NN and performs necessary actions to crop an image and save it to another file or whatever.

Any advice will be appreciated.

Solution

Ok, your question is actually directly related to programming:)

Ad I. The format is HEIF, but you access data of the image (if you develop an iPhone app) by means of iOS APIs, so you easily get information about bitmap as CVPixelBuffer.

Ad II. 1. Neural network gets an image as an input data.

As mentioned above, you want to get your bitmap first, so create a CVPixelBuffer. Check out this post for example. Then you use CoreML API. You want to use MLFeatureProvider protocol. An object which conforms to is where you put your vector data with MLFeatureValue under a key name picked by you (like "pixelData").

import CoreML

class YourImageFeatureProvider: MLFeatureProvider {

    let imageFeatureValue: MLFeatureValue
    var featureNames: Set<String> = []

    init(with imageFeatureValue: MLFeatureValue) {
        featureNames.insert("pixelData")
        self.imageFeatureValue = imageFeatureValue
    }

    func featureValue(for featureName: String) -> MLFeatureValue? {
        guard featureName == "pixelData" else {
            return nil
        }
        return imageFeatureValue
    }
}

Then you use it like this, and feature value will be created with initWithPixelBuffer initializer on MLFeatureValue:

let imageFeatureValue = MLFeatureValue(pixelBuffer: yourPixelBuffer)
let featureProvider = YourImageFeatureProvider(imageFeatureValue: imageFeatureValue)

Remember to crop/scale image before this operation so as to your network is being fed with a vector of a proper size.

NN analyzes it, finds required object on the image.

Use prediction function on your CoreML model.

do {

    let outputFeatureProvider = try yourModel.prediction(from: featureProvider)

    //success! your output feature provider has your data
} catch {

    //your model failed to predict, check the error
}

NN returns not only determinated type of object, but cropped object itself or array of coordinates/pixels of the area that should be cropped.

This depends on your model and whether you imported it correctly. Under the assumption you did, you access output data by checking returned MLFeatureProvider (remember that this is a protocol, so you would have to implement another one similar to what I made for you in step 1, smth like YourOutputFeatureProvider) and there you have a bitmap and rest of the data your NN spits out.

Application gets all required information from NN and performs necessary actions to crop an image and save it to another file or whatever.

Just reverse step 1, so from MLFeatureValue -> CVPixelBuffer -> UIImage. There are plenty of questions on SO about this so I won't repeat answers.

If you are a beginner, don't expect to have results overnight, but the path is here. For an experienced dev I would estimate this work for several hours to get work done (plus model learning time and porting it to CoreML).

Apart from CoreML (maybe you find your model too sophisticated and it won't be able to port it to CoreML) check out Matthjis Hollemans' github (very good resources on different ways of porting models to iOS). He is also around here and knows a lot in the subject.