Does a website have to save an image in order to get text from it?

I am trying to build a website, on which the user needs to upload an image with text on it. Now the website is suposed to get this text from the image. My question is do I need to save the image in order to get this text? If yes where should I save it? The reason for asking this question is that I was able to display the image on the website without saving it but not getting the text.

Solution

If you can afford to execute a triggered process once the user loads the image, you could use some OCR mechanism in order to extract the text, avoiding the image storage.

For example, the Tika project allows the extraction of text from images/documents by just running the .jar:

java -jar tika-app-1.25.jar -t uploadedImage.png

Seems like a live project, as its last version (1.25) was deployed one month ago. It uses Tesseract to perform the OCR processing, so you should also have that installed on your host(s).

It supports image recognition since version 1.17

Apache Tika 1.17 has been released! This release includes new support for automatic image captioning

More info about the Tika Project in its homepage and also in its javadoc.

In order to avoid a synchronous behaviour, you could also store the images in some type of queue or just your usual database, and process them later; This would allow you to perform the operation asynchronously and also to store the images just for a limited amount of time, until the OCR mechanism is applied on them.

Once processed, you could also compress the images and persist them in order to have some kind of back-up of the original content (just in case something fails).