image-processing computer-vision identification

Computer Vision - Recognize 'A' from an image of 'A'

Hi, I'm a novice programmer who's having trouble with simple image processing.

My goal here is to make the program recognize that the two A's are... well, both A's. If you look carefully enough, you'll realize that it's a bit different.(on the scale of pixels.) Although any literate person can read both as 'A', I'm sure that a program that compares pixel by pixel will not work because the two A's are actually different. And to make things worse, these two have different dimensions - one is 48*60, the other is 48*61.

I wonder if there are ways for a program to 'read' it both as A's. I have heard that this is something called computer vision(not so sure)... I would really prefer the method to be simple - it is not about identifying arbitrary characters; only 'A'. but if it can't be that way, any explanation to make the computer see both these as A's are really welcome.

Thanks in advance :)

Solution

First: character recognition not only isn't a simple problem, it's not a completely solved problem.

Are there many OCR implementations? Yes. Are those implementations good? It depends on the application. The more generalized you think OCR should be, the worse existing implementations look.

Long story short, there are books dedicated to this very subject, and it takes a book of some length to provide answers in any level of meaningful detail.

There are quite a few techniques for OCR (optical character recognition). Different techniques have been developed for (a) machine-printed characters versus (b) hand-written characters. Reading machine-printed characters is generally easier, but not necessarily easy. Reading handwritten characters can be very hard, and remains an incompletely solved problem. Keep in mind that there are other "scripts" (systems of characters for writing), and recognition techniques for Latin characters may be different than recognition techniques for traditional Chinese characters. [If you could write a mobile OCR application to read handwritten Chinese characters quickly and accurately, you could make a pile of money.]

https://en.wikipedia.org/wiki/Optical_character_recognition

There are quite a few approaches to OCR, and if you're interested in actually writing code to perform OCR than naturally you should consider implementing at least one of the simpler techniques first. From your comments it sounds like you're already looking into that, but briefly: do NOT look at neural networks first. Yes, you'll probably end up there, but there's much to learn about imaging, lighting, and basic image processing before you can put neural network techniques to much use.

But before you get into any deep, take some time to try to solve the problem yourself:

Write code yourself (don't use someone else's code) to load an image from file into memory.
Represent the image as a 2D array in memory.
Think of ways you might distinguish just a few characters or shapes from one another. First assume those characters are perfectly reproduced. For example, if an image contains multiple exact copies of the characters "1" and "2," what is the simplest way you can imagine distinguishing those characters?
Consider the same problem, but with characters that are only slightly different. For example, add a few "noise" pixels to each character.

After tinkering for a bit, read up on some basic image processing techniques. A good book is Digital Image Processing by Gonzalez and Woods.

(Normalized correlation is a simple algorithm you can read about online and in books. It's useful for certain simple types of OCR. You can think of normalized correlation as a method of comparing a "stencil" of a reference 'A' character to samples of other characters that may or may not be 'A' characters--the closer the stencil matches the sample, the higher the confidence the sample is an A.

So yes, try using OpenCV's template matching. First tinker with the OpenCV functions and learn when template matching works and when it fails, and then look more closely at the code.)

A recent survey of OCR techniques can be found in this book: Character Recognition Systems by Cheriet. It's a good starting point to investigate various algorithms. Some of the techniques will be quite surprising and counter-intuitive.

To learn more about how humans recognize characters--the details of which are often surprising and counter-intuitive--read the book Reading in the Brain by Dehaene. This book is quite readable and requires no special math or programming skills.

Finally, for any OCR algorithm it's important to keep the following in mind:

Image quality is important. Control image acquisition and lighting as best you can. Develop a good gut feeling for the effects of light, shadow, etc., on OCR results.
Set a goal for read rate accuracy. To avoid frustration, set a LOW goal at first--perhaps just 50%. There are various techniques for calculating what "accurate" means, but to start you can simply calculate the percentage of characters correctly identified or the percentage of words correctly identified. Achieving a read rate of 98% is not easy, and for some applications even that read rate is not particularly useful.
Recognizing words adds another layer of complexity.
It takes a long time to learn OCR in any depth. Take your time.
Always revisit assumptions about how OCR algorithms "should" be written. Even if an implementation is clever in steps 2, 3, 4, and 5, a bone-headed choice for step 1 will hobble the overall implementation.

Good luck!