Search code examples
grailsioapache-tika

Grails Tika Plugin - How do I send a file from a database to tika to parse


I've used this example to upload and download a file to the database. How can I, then, send that file to TikaService I have and parse the data?

The tutorial saves the filename and filedata separately. filedata is binary data.

I can parse a file within the app folder fine but I need to bring a file from the database.

OR, can I parse a file without saving it to the database?

Thanks in advance.

EDIT - Error

ERROR errors.GrailsExceptionResolver  - MissingPropertyException occurred when processing request: [GET] /myApp/document/parse/8
No such property: inputstream for class: com.myApp.DocumentController. Stacktrace follows:
Message: No such property: inputstream for class: com.myApp.DocumentController

Solution

  • The Apache Tika parse() method uses an InputStream as input. Since the filedata is a byte array, you can use a ByteArrayInputStream to provide the file data from your domain class to Apache Tika.

    def doc = Document.read(/*some id*/)
    def inputStream = new ByteArrayInputStream(doc.filedata)
    def parser = /* Your Apache Tika parser */
    def handler = /* An implementation of org.xml.sax.ContentHandler */
    def metadata = new org.apache.tika.metadata.Metadata()
    
    parser.parse(inputStream, handler, metadata)