I ma using R, version 3.3.2. I am trying to parse some text using new tesseract package. Image looks like this:
code is simple:
library(tesseract)
engine <- tesseract(options = list(tessedit_char_whitelist = "0123456789abcdefghijklmnopqrstuvwxyz"))
text <- ocr("some_image_path.png", engine = engine)
Result is:
Too few characters. Skipping this page
Why it doesn't recognize any character?
Because there are Too few characters
? There seems to be a limit of
const int kMinCharactersToTry = 50;
which is tested against, returning your error when it fails
// If there are too few characters, skip this page entirely.
if (real_max < kMinCharactersToTry / 2) {
tprintf("Too few characters. Skipping this page\n");
return 0;
}
Try again with a sample that has more than 25
characters?