swift machine-learning coreml apple-vision createml

Apple Vision Framework: LCD/LED digit recognition

I was developing on an iOS app and everything seemed to work pretty well until I tried capturing images of digital clock, calculators, blood pressure monitors, electronic thermometers, etc.

For some reason Apple Vision Framework and VNRecognizeTextRequest fail to recognize texts on primitive LCD screens like this one:

You can try capturing numbers with Apple's sample project and it will fail. Or you can try any other sample project for the Vision Framework and it will fail to recognize digits as text.

What can I do as an end framework user? Is there a workaround?

Solution

Train a model...

Train your own .mlmodel using up to 10K images containing screens of digital clocks, calculators, blood pressure monitors, etc. For that you can use Xcode Playground or Apple Create ML app.

Here's a code you can copy and paste into macOS Playground:

import Foundation
import CreateML

let trainDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Digits")

// let testDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Test")

var model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainDir), 
                                    parameters: .init(featureExtractor: .scenePrint(revision: nil), 
                                    validation: .none, 
                                 maxIterations: 25, 
                           augmentationOptions: [.blur, .noise, .exposure]))

let evaluation = model.evaluation(on: .labeledDirectories(at: trainDir))

let url = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Screens.mlmodel")

try model.write(to: url)

Extracting a text from image...

If you want to know how to extract a text from image using Vision framework, look at this post.