Hi, I'm a novice programmer who's having trouble with simple image processing.
My goal here is to make the program recognize that the two A's are... well, both A's. If you look carefully enough, you'll realize that it's a bit different.(on the scale of pixels.) Although any literate person can read both as 'A', I'm sure that a program that compares pixel by pixel will not work because the two A's are actually different. And to make things worse, these two have different dimensions - one is 48*60, the other is 48*61.
I wonder if there are ways for a program to 'read' it both as A's. I have heard that this is something called computer vision(not so sure)... I would really prefer the method to be simple - it is not about identifying arbitrary characters; only 'A'. but if it can't be that way, any explanation to make the computer see both these as A's are really welcome.
Thanks in advance :)
First: character recognition not only isn't a simple problem, it's not a completely solved problem.
Are there many OCR implementations? Yes. Are those implementations good? It depends on the application. The more generalized you think OCR should be, the worse existing implementations look.
Long story short, there are books dedicated to this very subject, and it takes a book of some length to provide answers in any level of meaningful detail.
There are quite a few techniques for OCR (optical character recognition). Different techniques have been developed for (a) machine-printed characters versus (b) hand-written characters. Reading machine-printed characters is generally easier, but not necessarily easy. Reading handwritten characters can be very hard, and remains an incompletely solved problem. Keep in mind that there are other "scripts" (systems of characters for writing), and recognition techniques for Latin characters may be different than recognition techniques for traditional Chinese characters. [If you could write a mobile OCR application to read handwritten Chinese characters quickly and accurately, you could make a pile of money.]
https://en.wikipedia.org/wiki/Optical_character_recognition
There are quite a few approaches to OCR, and if you're interested in actually writing code to perform OCR than naturally you should consider implementing at least one of the simpler techniques first. From your comments it sounds like you're already looking into that, but briefly: do NOT look at neural networks first. Yes, you'll probably end up there, but there's much to learn about imaging, lighting, and basic image processing before you can put neural network techniques to much use.
But before you get into any deep, take some time to try to solve the problem yourself:
After tinkering for a bit, read up on some basic image processing techniques. A good book is Digital Image Processing by Gonzalez and Woods.
(Normalized correlation is a simple algorithm you can read about online and in books. It's useful for certain simple types of OCR. You can think of normalized correlation as a method of comparing a "stencil" of a reference 'A' character to samples of other characters that may or may not be 'A' characters--the closer the stencil matches the sample, the higher the confidence the sample is an A.
So yes, try using OpenCV's template matching. First tinker with the OpenCV functions and learn when template matching works and when it fails, and then look more closely at the code.)
A recent survey of OCR techniques can be found in this book: Character Recognition Systems by Cheriet. It's a good starting point to investigate various algorithms. Some of the techniques will be quite surprising and counter-intuitive.
To learn more about how humans recognize characters--the details of which are often surprising and counter-intuitive--read the book Reading in the Brain by Dehaene. This book is quite readable and requires no special math or programming skills.
Finally, for any OCR algorithm it's important to keep the following in mind:
Good luck!