Search code examples

tesseract package in R doesn't recognize any character

I ma using R, version 3.3.2. I am trying to parse some text using new tesseract package. Image looks like this:


code is simple:

engine <- tesseract(options = list(tessedit_char_whitelist = "0123456789abcdefghijklmnopqrstuvwxyz"))
text <- ocr("some_image_path.png", engine = engine)

Result is:

Too few characters. Skipping this page

Why it doesn't recognize any character?


  • Because there are Too few characters? There seems to be a limit of

    const int kMinCharactersToTry = 50;

    which is tested against, returning your error when it fails

    // If there are too few characters, skip this page entirely.
      if (real_max < kMinCharactersToTry / 2) {
        tprintf("Too few characters. Skipping this page\n");
        return 0;

    Try again with a sample that has more than 25 characters?