Search code examples
javaimage-processingmining

Suggestions about how to mine information on the internet and extract text out of an image


I'm interested in how I can mine information on the internet and how to extract text out of an image.

So I'm searching for information on how to do this, I would like to program this on my own. Are there any papers that gives me a good explanation about mining and extracting?

Can someone help me on the way?

Kind regards,


Solution

  • You can take a look at Tess4J which is a java wrapper for Tesseract. That being said, image processing (text extraction) usually requires some pre-processing first, removing colours and sections which you know contain no text being some of the most common.