ios swift swiftui apple-vision swift-structs

Trouble assigning text recognized by Swift's Vision library to the instance attribute of a struct for display

I am trying to read text from an image using Swift's Vision library. I followed this guide - https://developer.apple.com/documentation/vision/recognizing_text_in_images.

With respect to my code, the image in question is self.image, which is a UIImage and the text to be displayed is self.recognizedText, which is a String. I am having trouble assigning the text recognized by a text recognition request to self.recognizedText in the completion handler, recognizeTextHandler. Note that I do remember to convert self.image to a CGImage object before performing the request.

The code that I have attached below is not my complete code. I have commented at the place where I assign self.recognizedText to the text recognized by Vision. I have left out UI components and certain states (booleans) that I use to control the UI components rendered. I am developing on iOS in the SwiftUI framework.

struct AnnotatorView: View {
    @State private var image: UIImage?
    @State private var recognizedText: String = "No text recognized."
    
     func recognizeTextHandler(request: VNRequest, error: Error?) -> Void {
        guard let results = request.results as? [VNRecognizedTextObservation] else {return}

        let recognizedStrings: Array<String> = results.compactMap({result in result.topCandidates(1).first!.string})

        // Problematic code.
        self.recognizedText = recognizedStrings.joined()
    }
    
    func performTextRecognition(requestHandler: VNImageRequestHandler, request: VNRecognizeTextRequest) -> String {
        do {
            try requestHandler.perform([request])
            return "Text recognition succeeded."
        } catch {
            return "Could not perform text recognition request because of the following error: \(error)"
        }
    }
    
    var body: some View {
        VStack(spacing: 15){
            
            // Camera opening button.
            
            // Image opening button.
            
            // Fullscreen cover that displays camera and sets self.image to a UIImage object.

            // Fullscreen cover that displays image and recognized text.
            .fullScreenCover(isPresented: self.$isAnnotatedImageDisplayed) {
                VStack {
                  // Fullscreen cover closing button.
                
                    if let cgImage: CGImage = self.image?.cgImage {
                        let requestHandler: VNImageRequestHandler = VNImageRequestHandler(cgImage: cgImage)
                        let recognizeTextRequest = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)     
                        let textRecognitionStatus: String = performTextRecognition(requestHandler: requestHandler, request: recognizeTextRequest)
                    
                      // Display whether recognition request went through, the image taken and text recognized (if any).
                      Text(textRecognitionStatus)
                    
                      // Display image.
                    
                      if (!self.recognizedText.isEmpty) {
                        Text("\(self.recognizedText)")
                    } else {
                        Text("No text recognized because the image is not good enough.")
                    }
                  } else {
                    Text("You haven't taken any pictures yet!")
                    Text("\(self.recognizedText)")
    }}}}}}

struct AnnotatorView_Previews: PreviewProvider {
    static var previews: some View {
        AnnotatorView()
    }
}

I have tried debugging on the console, but my console refuses to log any information. I have looked across the internet for solutions to this, but have come up empty. Regardless, my first attempt at fixing the issue was testing whether my text recognition request had gone through. I display this on the fullscreen cover with this code - Text(textRecognitionStatus). However, when I do, I see the following message - "Text recognition succeeded." - which I take to mean that the text recognition request has gone through without any errors. I have tried to use an array to store recognized text rather than a text, thinking it may have to do with mutation, but it did not make a difference. I looked at common issues with struct mutation, but most of these issues had to do with mutation outside the struct itself. I have considered that the issue might have to do with adding control flow statements in SwiftUI's declarative framework, but all other text displays correctly. If I have made any simple syntax related issues here, for instance, extra brackets or missing brackets, that may have been a mistake that I made while copying over my code to StackOverflow.

Solution

Here is a working sample, like I said before it is likely a timing issue because you are working in the body.

But something else to think about is that VNRecognizeTextRequest requires a JPEG, like because it does not have alpha/transparency.

import SwiftUI
import Vision
import VisionKit
struct TextRecognitionView: View {
    let model: TextRecognitionModel = .init()
    @State private var image: UIImage?
    @State private var recognizedText: String? = nil
    var body: some View {
        VStack(spacing: 15){
            VStack {
                switch image { //Unwrap the Image
                case .none:
                    Text("You haven't taken any pictures yet!")
                case .some(let image):
                    Image(uiImage: image)
                        .resizable()
                        .scaledToFit()
                    switch recognizedText { //Unwrap the text
                    case .none:
                        ProgressView() //Show this while recognizing
                    case .some(let text):
                        Text(text) //Show text
                    }
                }
                
                Button("set random text image") {
                    self.image = Text("Random text \((0...100).randomElement()!)")
                        .frame(width: 100, height: 100)
                        .snapshot().validJPEG() // Vision requires JPEG image likely because of transparency/alpha
                    self.recognizedText = nil //Clear the text
                    Task {
                        do {
                            self.recognizedText = try await model.performRequest(image: image!)
                        } catch {
                            self.recognizedText = error.localizedDescription //Show an error to the user
                            print(error)
                        }
                    }
                }
            }
        }
    }
}

struct TextRecognitionModel {
    /// async await version of a VNRecognizeTextRequest + VNImageRequestHandler
    func performRequest(image: UIImage) async throws -> String {
        guard let cgImage: CGImage = image.cgImage  else {
            throw RequestErrors.unableToRetrieveImage
        }
        let requestHandler: VNImageRequestHandler = VNImageRequestHandler(cgImage: cgImage)
        
        return try await withCheckedThrowingContinuation({ continuation in
            let request = VNRecognizeTextRequest(completionHandler: { request, error in
                if let error {
                    continuation.resume(throwing: error)
                } else {
                    let results = request.results as? [VNRecognizedTextObservation] ?? []
                    let recognizedStrings: Array<String> = results.compactMap({result in result.topCandidates(1).first!.string})
                    
                    continuation.resume(returning: recognizedStrings.joined())
                }
            })
            do {
                try requestHandler.perform([request])
            } catch {
                continuation.resume(throwing: error)
            }
        })
    }
}

enum RequestErrors: LocalizedError {
    case unableToRetrieveImage
}

struct TextRecognitionView_Previews: PreviewProvider {
    static var previews: some View {
        TextRecognitionView()
    }
}

extension View {
    func snapshot() -> UIImage {
        let controller = UIHostingController(rootView: self)
        let view = controller.view
        
        let targetSize = controller.view.intrinsicContentSize
        view?.bounds = CGRect(origin: .zero, size: targetSize)
        view?.backgroundColor = .clear
        
        let renderer = UIGraphicsImageRenderer(size: targetSize)
        
        return renderer.image { _ in
            view?.drawHierarchy(in: controller.view.bounds, afterScreenUpdates: true)
        }
    }
}

extension UIImage {
    func validJPEG() -> UIImage {
        guard let data = self.jpegData(compressionQuality: 1) else {
            return .init()
        }
        
        guard let jpegImage = UIImage(data: data) else {
            return .init()
        }
        return jpegImage
    }
}