Search code examples
rubyimagemagickocrtesseractrmagick

Tesseract not seeing the image density of a file created in Rmagick


I'm trying to use Rmagick and Tesseract for OCR. I'm creating a simple image file with some text here:

    canvas = Magick::ImageList.new
    canvas.new_image(300, 300) { self.density = "500" }

    text = Magick::Draw.new
    text.annotate(canvas, 0,0,2,2, 'some_text') {
      self.font = font_path
      self.gravity = Magick::CenterGravity
      self.pointsize = 100
      self.density = '300'
    }

    canvas.write('tmp_text_img.png')

And I try to read it with a shell script here:

`tesseract #{input} tmp_text_from_img`

However, Tesseract keeps giving me a warning:

Warning. Invalid resolution 0 dpi. Using 70 instead.

This results in really crappy accuracy. Which I find strange because I'm explicitly setting it twice when I create the image. Is there something I'm doing wrong? Or is there a way to force tesseract to use the proper dpi?

Thanks all!


Solution

  • Solved it with:

    canvas.write('tmp_text_img.png') {self.units= Magick::PixelsPerInchResolution; self.density = "300"}