Search code examples
splitnumbersocrtesseractspacing

How to separate (space between) numbers in tesseract OCR


I try to get numbers from image

but after submitting my result is 2 332223355 1 23, i don't really understand how does it splits, everything i need is to split one, two and three digit numbers with space. can anybody help me?


Solution

  • Use:

    tesseract -psm 7 NXect.png stdout

    which gives for the image you provided:

    2 3 32 22 33 55 123‘
    

    The tesseract version I am using:

    $ tesseract --version
    tesseract 3.04.01
     leptonica-1.73
      libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
    

    gives me for the original image without any options:

    Error in pixGenHalftoneMask: pix too small: w = 250, h = 58
    23 32 22 33 55 123
    

    and for the resized image (2x):

    $ tesseract  NXect_x2.png stdout
    23 32 22 33 55 123
    

    so I can't confirm the OCR result you are getting out the image.