I create empty maven app, then add dependencies
tika-core
tika-parsers-standard-package
slf4j-api
slf4j-simple
and maven-assembly-plugin to make a fat jar.
resulting 47M file has 102 lines of mvn dependency:tree
Everything looks OK, but while running this troubleshooting snippet
Tika tika = new Tika();
CompositeParser parser = (CompositeParser)tika.getParser();
for (MediaType type : parser.getSupportedTypes(new ParseContext())) {
String typeStr = type.toString();
System.out.println(++cnt + " " + typeStr);
}
tika reports only two supported mimetypes
video/mpeg
video/x-msvideo
I see not a single warning from
java -jar target/smallTika-1.0-SNAPSHOT-jar-with-dependencies.jar -Dorg.apache.tika.service.error.warn=true
I've tried a lot with configs, versions, switching to maven shade plugin and so on
Funny thing: if I run the app from IDE it reports 229 mimetypes and works like a charm.
Also, if I create empty springboot app and add
tika-core
tika-parsers-standard-package
deps, this diagnostic snippet also works fine (as standard spring jar).
SOLUTION (Gagravarr's comment):
switch to maven-shade-plugin plugin and add transformer
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
Apache Tika makes use of META-INF/services
files to allow runtime discovery of available classes for things like Detection and Parsers
When building a shaded / fat jar, you need to take care with these services files. They are plain text files, one class name per line. You want to append them all together, and not take just the first or last one (which is often the default with shading tools)