Find the bounding box of each character in a license plate

I use emgu with c# to read a license plate in an image. After edge detection I want find the bounding box of each character in it and use neural networks to recognize characters. How can I do it? Thanks

Solution

Well the simplest method since you can detect the license plate is to look for the dividing lines. I'm afraid I can only speculate from google images for Iranian number plates (if this is what your using) however after each letter there is a break and a white or yellow area.

To Find the bounding boxes of the individual letters:

You could look at the sum of the columns and where there is a peak in yellow or white and take that as a dividing point. Or you could sum only the black components or the writing, in ideal circumstances you will start with 0 find black components and then return to a a count of 0 and you have your letter. A little adaptable statistics may be needed here.

[EDIT] Segment the license plate from the image. Start by looking at the sum of each column, you will notice peaks of 255 * the height of the license plate image. Use this as your threshold, find the middle of these peaks and you have the point in which denotes a letter edge. You can segment your image using this data.

Now the peaks may be hard statistically to segment reliably, they shouldn't but just in case. Invert your image so your white is black and your black is white. Again take the sum of the columns in this case the peaks are the locations of the letters now you look for changes from 0 to >1 and wait until you find a 0 again. Recording the x position where this happens will give you your letter locations. I will give you the code for the sum of the columns if needed but google will also have your answer the statistics required are all you, just translate the steps.

An alternative method

An alternative to dividing the image into separate squares or regions and a favourite of students is simply to scan a mask across the license plate. So you feed into your Neural Network (NN) the first ROI say (0,0,100,100) then move one along the y axis (0,1,100,100). You continue this till you read in all your data. You obviously risk the NN from over detecting as it can classify the same letter so many times so when you classify a letter you can always jump 20 pixels or so removing false classifications.

Obviously your will need to reduce the size of the license plate image to make this method quicker. I have seen accurate OCR using 9 by 9 arrays however you will require larger use you best judgement 20x20 should be suffice but have a look.

[EDIT] Efficiency

Which one is better? well it depends. They will all work (depending on NN tranining), however the methods involved in finding the bounding boxes of the individual letters can be hard to set up reliably. The scanning across of the mask feeding all data into the NN is usually quite reliable but can be incredibly inefficient. If you working with 20*20 images that's 400 data points to feed into the NN and you've got to times that by the license plate width -20. That will give you the maximum number of iterations through the loop.

NN can take a long time to train but also execute with large amounts of data (depending on NN). The method of segmenting each letter is more efficient as you really less on the NN and can feed more accurate data into your NN.

The problem you face is if your using the OCR engine already built into EMGU OCR recognition is extremely quick. As you will be able to see in the EMGU example the only way to decide on the best method is to write and compare all 3 methods. If you just need one that works then use the NN one and where you get a match note that as your letter ROI since you will still know the X position along the license plate.

I'm sorry I can't give you a more direct answer on which ones best but there are two many factors that could effect things.

I hope some of these methods help,

Many Thanks Chris