Digit Recognition with Bayesian classes

I need to write an OCR program for digits only. I will use MNIST datasets. The problem is I do not know where to start. There are a lot of papers which doesn't really explain the algorithm. I don't really have much knowledge about pattern recognition. So I have a few questions.

Q1 : Where can I find the algorithm (or a tutorial) Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods. Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.

Note: I know my OCR program won't be accurate. It isn't very important for now.

Solution

If the closest engineering library to you has a section on image processing, computer vision, or machine vision, then with luck that library will have a copy of a book I recommend for OCR:

Character Recognition Systems by Cheriet, Kharma, Liu, and Suen

This book provides a fairly comprehensive overview of OCR techniques and recent research. It does not go into great depth on any particular subject, but it does provide references to academic papers.

Make sure you have access to a good introductory textbook on image processing. The book by Gonzalez and Woods is a standard in many universities:

Digital Image Processing by Gonzalez and Woods

Even "simple" OCR gets tricky very quickly. It could be overwhelming if you jump into a class about neural networks, Bayes theorem, etc., before you have a firm grasp of basic image processing principles.

If you can, try writing one or more OCR algorithms for machine-printed characters before you attempt to write an algorithm for handwritten characters.

Q1 : Where can I find the algorithm (or a tutorial)

There are numerous algorithms for OCR. The Cheriet book will give you a good start.

Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.

Try implementing that technique and see how well it works. Even if the implementation doesn't work as well as you'd like, lessons learned while implementing it could help you later.

You can also subdivide a character into a 2 x 2 grid or 3 x 3 grid and check for relatively densities of pixels. Unlike machine printed characters, handwritten characters won't line up nicely in rectilinear grids.

Template matching using normalized correlation is simple, and it can work reasonably well for machine printed characters for a single, known font. It's relatively simple to implement and worth learning: http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation

For OCR it's common to thin the characters in your sample as an initial step. Thinning is a technique to reduce a character (or any other shape) to a representation that is 1 pixel wide. Once you have a thinned character it can be easier to identify lines and intersections. If you can identify lines (or curves) and intesections, then one technique is to look at the relative position and angle of each line with respect to the others.

Common thinning algorithms include Stentiford and Zhang-Suen. There's a freeware version of WinTopo that demonstrates both of these algorithms: http://wintopo.com/

You can look into academic papers about "stroke extraction", but those techniques tend to be more difficult to implement.

Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.

These terms refer to artificial neural networks. For a simple OCR algorithm you'll hard-code the recognition logic OR use simple training methods. Artificial neural networks can be trained to recognize characters that aren't hard-coded in your software. http://en.wikipedia.org/wiki/Neural_network

Although you don't need to learn about artificial neural network to write a simple OCR algorithm, a simple algorithm will have only limited success with handwritten characters.

Above all, keep in mind that OCR for handwritten characters is an extremely difficult problem. If you could achieve a handwritten character read rate of 20% with a simple technique, then consider that a success.