I am using Tesseract in iOS 8 for an OCR based app but it incorrectly converts the division "÷" symbol in the image to a plus "+" sign.
For example, this image
always converts to the text string "8+4+4". It should be "8+4÷4".
I've tried using different trained data language files "eng+equ", "ita", adding "÷" to the whitelist, setting the ocr_engine variable to cube, converting image to grayscale or black & white, upsizing the image by 2 and 4 times.
Everything I've tried always returns a plus "+" sign instead of a division "÷" symbol.
I tried using only the "equ" trained data file and that DOES return the division symbol correctly - but all other characters are then garbage.
I've been looking into this (Google, Stackoverflow) for several days and cannot figure it out.
How do I get Tesseract to include and recognize the division "÷" symbol?
UPDATE:
The best I have been able to do is to set the AVCaptureSession preset to high
AVCaptureSession *session = [[AVCaptureSession alloc] init];
session.sessionPreset = AVCaptureSessionPresetHigh;
The captured image above dimensions are then 676 × 405 pixels. Using Tesseract OCR UIImage category (image is named 'source') to binarize the image:
// Binarize the source image to improve contrast (using the UIImage category provided by TesseractOCR)
UIImage *blackAndWhiteImage = [source blackAndWhite];
[self.tesseract setImage:blackAndWhiteImage];
This will usually convert the division symbol to the text "-1-", but I've seen "-:-" and other numbers and uppercase characters between the minus signs.
I can check for that in the returned text. But then it is impossible to know whether to treat the returned text "8-1-2" as a true subtraction or 'maybe' division.
Train the or engine wit different fonts.
Here is the tool for training the engine. Have a look on this also
Or you can use JTessBoxEditor