Search code examples

Text recognition from a live video stream using ML kit (with CMSampleBuffer)

I'm trying to modify the on-device text recognition example provided by Google here to make it work with a live camera feed.

When holding the camera over text (that works with the image example) my console produces the following in a stream before ultimately running out of memory:

2018-05-16 10:48:22.129901+1200 TextRecognition[32138:5593533] An empty result returned from from GMVDetector for VisionTextDetector.

This is my video capture method:

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

        if let textDetector = self.textDetector {

            let visionImage = VisionImage(buffer: sampleBuffer)
            let metadata = VisionImageMetadata()
            metadata.orientation = .rightTop
            visionImage.metadata = metadata

            textDetector.detect(in: visionImage) { (features, error) in
                guard error == nil, let features = features, !features.isEmpty else {
                    // Error. You should also check the console for error messages.
                    // ...

                // Recognized and extracted text
                print("Detected text has: \(features.count) blocks")
                // ...



Is this the right way to do it?


  • ML Kit has long migrated out of Firebase and became a standalone SDK (migration guide).

    The Quick Start sample app in Swift showing how to do text recognition from a live video stream using ML Kit (with CMSampleBuffer) is now available here:

    The live feed is implemented in the CameraViewController.swift: