opencv image-processing computer-vision ocr contour

How to remove unwanted noise inbetween the digits of an image

I am working in a project of optical character recognition using OpenCV. I have implemented ocr successfully on normal digits. But in Realtime scenario, i am getting problem with unwanted noise in between the digits. original image

i have converted to grayscale and applied threshold, then it comes like this

If i apply contour detection,then i will get those black blocks also.How can i eliminate that blocks between digits.I have no problem with OCR,I just want to remove that unwanted noise and to rotate the image.Thank you.

After rotating and removing black blobs from the image.This is the progress i achieved and how can I remove those blocks which are attached to the first digit due to shadow.

I am getting problem with the ocr. first and last 2 digits are not recognised correctly.How to improve efficiency of ocr. After traininig the system with those digits where i am getting wrong with the sample digits from real time, I got the right results.

final ocr Image :

final ocr image

Solution

Removing the black bars without a priori knowledge of the geometry, I mean by pure blob analysis, is virtually impossible. It is also impossible to avoid that they come in contact with the digits, because of the strong shadows at the bottom.

I suggest to do your best to find those black bars, which are the places where the blobs have the greatest vertical extent. Maybe it is also possible to locate them in a profile obtained by taking the averages over the columns (six strong local minima).

When you have located these bars horizontally, you can erase them in the original image by filling white rectangles. You can also locate them vertically in their respective slices and use this information to perform deskewing, followed by erasure. You can also predict the position of the leftmost and rightmost voids.

This is the kind of result you can achieve: