Search code examples
c#ocrimage-recognitiongoogle-vision

Google Vision complex OCR execution with two-column text


Sorry if this question was already asked here before, but I was unable to find an answer.

So, I'm creating C# OCR program in order to scan shop receipts. Vision OCR DocumentTextDetection itself works very accurately with text, but i have other problem:

when I scan receipt (template given in image below), Vision OCR behaves strangely with two-column dense text. For instance, I have this receipt template format:

enter image description here

The response is usually one-column string filled with either product name from first column, or price from second.

So the example of usual response:

RECEIPT Product1 Product2 Product3 9.99 A Product4 9.99 A 12.10 A Product5

This response doesn't let me correctly connect each item with corresponding price.

When I decrease distance between products and prices columns using image editing programs (such as Photoshop), it works correctly and scans receipt line-by-line, therefore I can easily identify which price belongs to which product.

My question would be: could you please give me a hint, how I could programically adjust the distance between those 2 columns by creating the new image. Or maybe the better idea would be to separate receipt image to 2 images, 1 image per column and OCR them separately? But I honestly have no idea how I could identify columns space and cut them into new images, so any suggestions regarding this?


Solution

  • First, binarize the image, and then use some image processing algorithms such as "morphology - erosion" to preprocess it for splitting the original image into half according to the space between two columns. How? Since the pixel value is the lowest in the black region, you can recognize there is a drop value when scanning the original horizontally. Finally, you can use OCR to detect the numbers.

    enter image description here