Search code examples
algorithmimage-processinglanguage-agnosticocr

OCR scanning from complex document


I need to create a tool that would use a high-quality camera to scan specific blocks of text from the document and OCR them. Each document matches the same template, containing few tables filled with data. I need to extract data from one specific cell from each scanned document.

I need to account with the rotation and minor transformation of the image. The whole workflow should look like this:

  1. Document is "shown" to the camera. Software makes the picture of the document.
  2. Software accounts for some minor rotation and other transformations (minor shearing, scaling, rotation can occur because the document is being held in hands).
  3. Software identifies that the proper-template document is being shown and extracts the image from the specific cell.
  4. The image is then OCR'd.

Basically, I don't need a final solution, but rather some directions on where to start looking. I know how to do OCRing of a plain text, what I don't know is how to implement step 2 and 3.

Thanks in advance.


Solution

  • Basically, OCR of a plain text, espessially when it comes to very good scanned images, – is a well solved task. What you describe is a step further - image preprocessing and field-level recognition with data capture. As far as i know, open source engines (even tesseract that considered to be the best among them) doesn't provide such functionality.

    At the same time, proprietary OCR engines have been solving the tasks you describe for years (with a huge human resources spent) and very-well progressed. So if you're planning a commercial software, i suggest you have a look at http://ocrsdk.com, it's a cloud OCR SDK with web API. It lets you upload an image and sends you back OCRed data. It already has all possible image preprocessing algorythms built in, so you won't have to worry about step 2. As for step 3 - you may want to refer to this section of its documentation. I was a part of a team that developed the front-end of this service, so i can tell a little bit more about it. Hope it helps!