Search code examples
javagradleexecutable-jarapache-tikabuild-dependencies

Gradle, Tika - Exclude some dependency packages making a "fat jar" too fat


I'm making an app which creates Lucence indices on a handful of well-known document formats (.docx, .odt, .txt, etc.).

Tika is ideal for extracting the text but it appears to be the culprit in making my fat jar balloon to 62 MB.

To make the fat jar I'm doing this in my build.gradle:

buildscript {
    repositories { jcenter() }
    dependencies { // fatjar
        classpath 'com.github.jengelman.gradle.plugins:shadow:1.2.4' }
}
apply plugin: 'com.github.johnrengelman.shadow'
shadowJar {
    baseName = project.name
    classifier = null
    version = project.version
}

task copyJarToBin(type: Copy) {
    from shadowJar
    into "D:/My Documents/Software projects/Operative/" + project.name
}

When I go gradle dependencies, Tika does indeed appear to have hundreds... most of them obviously I don't need.

Is there a known Gradle way of excluding/filtering out certain dependencies?

Specific to Tika: if anyone knows how to identify which dependencies handle which file types, that would be very useful too...


Solution

  • Take a look at Gradle dependency management. You can exclude dependencies by module, group or both:

    compile('library:with-a-lot-of-deps:1.0') {
        exclude module: 'weird-extension'
        exclude group: 'microsoft-extensions'
        exclude group: 'adobe-extensions', module: 'pdf-extension' 
    }
    

    And you can also remove dependencies from all configurations:

    configurations {
        all*.exclude group: 'all-the-unneeded-extensions'
    }
    

    No idea about Tika, but that would probably be a separate question anyway. Might be a good idea to read on Tika docs and inspect META-INF directory in the Jars.