Search code examples
swiftios11xcode9-betaarkitcoreml

Combining CoreML and ARKit


I am trying to combine CoreML and ARKit in my project using the given inceptionV3 model on Apple website.

I am starting from the standard template for ARKit (Xcode 9 beta 3)

Instead of intanciating a new camera session, I reuse the session that has been started by the ARSCNView.

At the end of my viewDelegate, I write:

sceneView.session.delegate = self

I then extend my viewController to conform to the ARSessionDelegate protocol (optional protocol)

// MARK: ARSessionDelegate
extension ViewController: ARSessionDelegate {

    func session(_ session: ARSession, didUpdate frame: ARFrame) {

        do {
            let prediction = try self.model.prediction(image: frame.capturedImage)
            DispatchQueue.main.async {
                if let prob = prediction.classLabelProbs[prediction.classLabel] {
                    self.textLabel.text = "\(prediction.classLabel) \(String(describing: prob))"
                }
            }
        }
        catch let error as NSError {
            print("Unexpected error ocurred: \(error.localizedDescription).")
        }
    }
}

At first I tried that code, but then noticed that inception requires a pixel Buffer of type Image. < RGB,<299,299>.

Although not recommenced, I thought I would just resize my frame then try to get a prediction out of it. I am resizing using this function (took it from https://github.com/yulingtianxia/Core-ML-Sample)

func resize(pixelBuffer: CVPixelBuffer) -> CVPixelBuffer? {
    let imageSide = 299
    var ciImage = CIImage(cvPixelBuffer: pixelBuffer, options: nil)
    let transform = CGAffineTransform(scaleX: CGFloat(imageSide) / CGFloat(CVPixelBufferGetWidth(pixelBuffer)), y: CGFloat(imageSide) / CGFloat(CVPixelBufferGetHeight(pixelBuffer)))
    ciImage = ciImage.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: imageSide, height: imageSide))
    let ciContext = CIContext()
    var resizeBuffer: CVPixelBuffer?
    CVPixelBufferCreate(kCFAllocatorDefault, imageSide, imageSide, CVPixelBufferGetPixelFormatType(pixelBuffer), nil, &resizeBuffer)
    ciContext.render(ciImage, to: resizeBuffer!)
    return resizeBuffer
} 

Unfortunately, this is not enough to make it work. This is the error that is catched:

Unexpected error ocurred: Input image feature image does not match model description.
2017-07-20 AR+MLPhotoDuplicatePrediction[928:298214] [core] 
    Error Domain=com.apple.CoreML Code=1 
    "Input image feature image does not match model description" 
    UserInfo={NSLocalizedDescription=Input image feature image does not match model description, 
    NSUnderlyingError=0x1c4a49fc0 {Error Domain=com.apple.CoreML Code=1 
    "Image is not expected type 32-BGRA or 32-ARGB, instead is Unsupported (875704422)" 
    UserInfo={NSLocalizedDescription=Image is not expected type 32-BGRA or 32-ARGB, instead is Unsupported (875704422)}}}

Not sure what I can do from here.

If there is any better suggestion to combine both, I'm all ears.

Edit: I also tried the resizePixelBuffer method from the YOLO-CoreML-MPSNNGraph suggested by @dfd , the error is exactly the same.

Edit2: So I changed the pixel format to be kCVPixelFormatType_32BGRA (not the same format as the pixelBuffer passed in the resizePixelBuffer).

let pixelFormat = kCVPixelFormatType_32BGRA // line 48

I do not have the error anymore. But as soon as I try to make a prediction, the AVCaptureSession stops. Seems I am running into the same issue Enric_SA is running on the apple developers forum.

Edit3: So I tried implementing rickster solution. Works well with inceptionV3. I wanted to try a a feature observation (VNClassificationObservation). At this time, it is not working using TinyYolo. The bounding are wrong. Trying to figure it out.


Solution

  • Don't process images yourself to feed them to Core ML. Use Vision. (No, not that one. This one.) Vision takes an ML model and any of several image types (including CVPixelBuffer) and automatically gets the image to the right size and aspect ratio and pixel format for the model to evaluate, then gives you the model's results.

    Here's a rough skeleton of the code you'd need:

    var request: VNRequest
    
    func setup() {
        let model = try VNCoreMLModel(for: MyCoreMLGeneratedModelClass().model)
        request = VNCoreMLRequest(model: model, completionHandler: myResultsMethod)
    }
    
    func classifyARFrame() {
        let handler = VNImageRequestHandler(cvPixelBuffer: session.currentFrame.capturedImage,
            orientation: .up) // fix based on your UI orientation
        handler.perform([request])
    }
    
    func myResultsMethod(request: VNRequest, error: Error?) {
        guard let results = request.results as? [VNClassificationObservation]
            else { fatalError("huh") }
        for classification in results {
            print(classification.identifier, // the scene label
                  classification.confidence)
        }
    }
    

    See this answer to another question for some more pointers.