Search code examples
javamavenapache-tika

How to use a Tika custom parser in a jar file?


I'm trying to write a custom Apache Tika parser (for DICOM medical images), and package it as a plugin in a jar file.

I'm following the instructions from http://tika.apache.org/1.18/parser_guide.html, and took these projects as models:

So, I created a Maven project, wrote a parser class and a org.apache.tika.parser.Parser file in the resources folder, built the project with mvn install, and I now have a jar file.

My question is, how do I make Tika use this new parser? The instructions on the Tika wiki say:

To install a plugin, download it according to instructions below and drop the jar(s) on your classpath. Tika will auto detect the plugin.

I tried to do this with java -classpath /path/to/my-parser.jar ... but it doesn't seem to work:

java -classpath /path/to/my-parser.jar -jar tika-app-1.18.jar --list-parsers

doesn't list the new parser, for instance.

I'm not a java person, and I'm really not sure about what "drop the jar on your classpath" means. I would really appreciate if someone could point me to the right direction! Thanks.


Solution

  • You've sadly made a common Java newbie mistake - for various historic reasons the java program won't accept both -jar and -classpath options, and will ignore the -classpath parts you've given.

    If you want to run the Apache Tika App on the command line, with an extra parser jar or two added, what you need to do is something like:

    java -classpath tika-app.jar:my-extra-parser.jar org.apache.tika.cli.TikaCLI --list-parsers
    

    That calls the main Tika App entry point (the default with -jar) when running with both the Tika App jar and your custom extra jar on the classpath.

    You may also find the Troubleshooting Apache Tika guide from the Tika wiki useful when developing custom plugins like this!