Search code examples
javaapache-tika

Apache Tika parser not working in fat jar


I create empty maven app, then add dependencies

tika-core
tika-parsers-standard-package
slf4j-api 
slf4j-simple

and maven-assembly-plugin to make a fat jar.

resulting 47M file has 102 lines of mvn dependency:tree Everything looks OK, but while running this troubleshooting snippet

Tika tika = new Tika();

CompositeParser parser = (CompositeParser)tika.getParser();

for (MediaType type : parser.getSupportedTypes(new ParseContext())) {
    String typeStr = type.toString();
    System.out.println(++cnt + " " + typeStr);
}

tika reports only two supported mimetypes

video/mpeg
video/x-msvideo

I see not a single warning from

java -jar target/smallTika-1.0-SNAPSHOT-jar-with-dependencies.jar -Dorg.apache.tika.service.error.warn=true

I've tried a lot with configs, versions, switching to maven shade plugin and so on

Funny thing: if I run the app from IDE it reports 229 mimetypes and works like a charm.

Also, if I create empty springboot app and add

tika-core
tika-parsers-standard-package

deps, this diagnostic snippet also works fine (as standard spring jar).

SOLUTION (Gagravarr's comment):

switch to maven-shade-plugin plugin and add transformer

<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>

Solution

  • Apache Tika makes use of META-INF/services files to allow runtime discovery of available classes for things like Detection and Parsers

    When building a shaded / fat jar, you need to take care with these services files. They are plain text files, one class name per line. You want to append them all together, and not take just the first or last one (which is often the default with shading tools)