I'm trying to detect the border of the scanned documents because it will help increase my OCR extraction rate. Borders are considered marginal noise so I have to get rid of them. Borders usually have the highest density in an image.
I had examine every column of pixels in an image and the column which has the highest density is probably a border, if and only if, it is a line. And that's where my problem arises. I don't know how to detect if the column of pixel is a line or not.
Any help would be very much appreciated.Thanks.
You use Hough line transform, but it will give lines for the data on which you need to do OCR.
The simplest solution based on your question i can think of is this. Since its border, you can reduce the search space based on some threshold in width and height. For example, if your image is 'w x h' and your search space width 's' your search space will be '0 to s' 'w-s to w' '0 to s' 'h-s to h'.