ios swift computer-vision arkit apple-vision

Frame information in completion handler for text detection in ARSession

I am using Core Vision to detect text boxes in an ARKit session, my problem is accessing the frame to perform a hit test once I have detected the boxes.

func startTextDetection() {
    let textRequest = VNDetectTextRectanglesRequest(completionHandler: self.detectTextHandler)
    textRequest.reportCharacterBoxes = true
    self.requests = [textRequest]
}

func detectTextHandler(request: VNRequest, error: Error?) {
    guard let observations = request.results else {
        print("no result")
        return
    }

    let result = observations.map({$0 as? VNTextObservation})
    for box in result {
        let hit = frame.hitTest(box?.topRight - box?.bottomLeft, types: ARHitTestResult.ResultType.featurePoint )
        let anchor = ARAnchor(transform:hit.worldTransform)
        sceneView.session.add(anchor:anchor)
    }
    //DispatchQueue.main.async() {

    //}
}

Ideally I would pass it to the completion handler from the ARSession delegate method but although the documentation says I can pass a completion handler here, I hav not found a way to do it.

func session(_ session: ARSession, didUpdate frame: ARFrame) {
    // Retain the image buffer for Vision processing.
    let pixelBuffer = frame.capturedImage
    let requestOptions:[VNImageOption : Any] = [:]

    let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.up, options: requestOptions)

    do {
        try imageRequestHandler.perform(self.requests)
    } catch {
        print(error)
    }
}

I can keep a dictionary and look it up but it is not really elegant and it is prone to bugs and leaks. I would rather pass the relevant frame where I request the text detection.

Any ideas?

Solution

Why don't you use your session's currentFrame property inside the completion handler? It contains the current frame of the session. Plus you don't need to pass any frame instance to your completion handler anymore. It is simply accessible using your sceneView instance.

So you can change your completion handler like below:

func detectTextHandler(request: VNRequest, error: Error?) {
    guard let currentFrame = sceneView.session.currentFrame else { return }
    ...
    // perform hit test using currentFrame
    let hit = currentFrame.hitTest(box?.topRight - box?.bottomLeft, types: ARHitTestResult.ResultType.featurePoint ) 
    ...
}

You can use currentFrame to create the image request handler in session(_:didUpdate:) as well:

let pixelBuffer = sceneView.currentFrame.capturedImage

Also, note that firing perform() method of VNImageRequestHandler in session(_:didUpdate:) is not efficient and takes so much process since it's running all the time, you could use a Timer instead to reduce amounts of time you perform image detection process.

Edit: Since image detection is async and might take time to finish, you can store the frame in another instance when making request, and use that instance inside completion handler:

var detectionFrame: ARFrame?

// Timer block
detectionFrame = sceneView.session.currentFrame
let pixelBuffer = detectionFrame.capturedImage
// image detection request code


func detectTextHandler(request: VNRequest, error: Error?) {
    guard let frame = detectionFrame else { return }
    ...
    let hit = frame.hitTest(box?.topRight - box?.bottomLeft, types: ARHitTestResult.ResultType.featurePoint ) 
    ...
}