I am using Core Vision to detect text boxes in an ARKit session, my problem is accessing the frame
to perform a hit test once I have detected the boxes.
func startTextDetection() {
let textRequest = VNDetectTextRectanglesRequest(completionHandler: self.detectTextHandler)
textRequest.reportCharacterBoxes = true
self.requests = [textRequest]
}
func detectTextHandler(request: VNRequest, error: Error?) {
guard let observations = request.results else {
print("no result")
return
}
let result = observations.map({$0 as? VNTextObservation})
for box in result {
let hit = frame.hitTest(box?.topRight - box?.bottomLeft, types: ARHitTestResult.ResultType.featurePoint )
let anchor = ARAnchor(transform:hit.worldTransform)
sceneView.session.add(anchor:anchor)
}
//DispatchQueue.main.async() {
//}
}
Ideally I would pass it to the completion handler from the ARSession
delegate method but although the documentation says I can pass a completion handler here, I hav not found a way to do it.
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Retain the image buffer for Vision processing.
let pixelBuffer = frame.capturedImage
let requestOptions:[VNImageOption : Any] = [:]
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.up, options: requestOptions)
do {
try imageRequestHandler.perform(self.requests)
} catch {
print(error)
}
}
I can keep a dictionary and look it up but it is not really elegant and it is prone to bugs and leaks. I would rather pass the relevant frame where I request the text detection.
Any ideas?
Why don't you use your session's currentFrame
property inside the completion handler? It contains the current frame of the session. Plus you don't need to pass any frame
instance to your completion handler anymore. It is simply accessible using your sceneView
instance.
So you can change your completion handler like below:
func detectTextHandler(request: VNRequest, error: Error?) {
guard let currentFrame = sceneView.session.currentFrame else { return }
...
// perform hit test using currentFrame
let hit = currentFrame.hitTest(box?.topRight - box?.bottomLeft, types: ARHitTestResult.ResultType.featurePoint )
...
}
You can use currentFrame
to create the image request handler in session(_:didUpdate:)
as well:
let pixelBuffer = sceneView.currentFrame.capturedImage
Also, note that firing perform()
method of VNImageRequestHandler
in session(_:didUpdate:)
is not efficient and takes so much process since it's running all the time, you could use a Timer
instead to reduce amounts of time you perform image detection process.
Edit: Since image detection is async and might take time to finish, you can store the frame in another instance when making request, and use that instance inside completion handler:
var detectionFrame: ARFrame?
// Timer block
detectionFrame = sceneView.session.currentFrame
let pixelBuffer = detectionFrame.capturedImage
// image detection request code
func detectTextHandler(request: VNRequest, error: Error?) {
guard let frame = detectionFrame else { return }
...
let hit = frame.hitTest(box?.topRight - box?.bottomLeft, types: ARHitTestResult.ResultType.featurePoint )
...
}