Search code examples
swiftmachine-learningcoremlapple-visioncreateml

Apple Vision Framework: LCD/LED digit recognition


I was developing on an iOS app and everything seemed to work pretty well until I tried capturing images of digital clock, calculators, blood pressure monitors, electronic thermometers, etc.

For some reason Apple Vision Framework and VNRecognizeTextRequest fail to recognize texts on primitive LCD screens like this one:

enter image description here

You can try capturing numbers with Apple's sample project and it will fail. Or you can try any other sample project for the Vision Framework and it will fail to recognize digits as text.

What can I do as an end framework user? Is there a workaround?


Solution

  • Train a model...

    Train your own .mlmodel using up to 10K images containing screens of digital clocks, calculators, blood pressure monitors, etc. For that you can use Xcode Playground or Apple Create ML app.

    Here's a code you can copy and paste into macOS Playground:

    import Foundation
    import CreateML
    
    let trainDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Digits")
    
    // let testDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Test")
    
    var model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainDir), 
                                        parameters: .init(featureExtractor: .scenePrint(revision: nil), 
                                        validation: .none, 
                                     maxIterations: 25, 
                               augmentationOptions: [.blur, .noise, .exposure]))
    
    let evaluation = model.evaluation(on: .labeledDirectories(at: trainDir))
    
    let url = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Screens.mlmodel")
    
    try model.write(to: url)
    


    Extracting a text from image...

    If you want to know how to extract a text from image using Vision framework, look at this post.