I'm interested in how I can mine information on the internet and how to extract text out of an image.
So I'm searching for information on how to do this, I would like to program this on my own. Are there any papers that gives me a good explanation about mining and extracting?
Can someone help me on the way?
Kind regards,
You can take a look at Tess4J
which is a java wrapper for Tesseract. That being said, image processing (text extraction) usually requires some pre-processing first, removing colours and sections which you know contain no text being some of the most common.