Search code examples
javagradleapache-tika

Tika parsers configuration in SpringBoot


I'm using a tika parsers in my project. I'm using three classes from this package:

org.apache.tika.Tika;
org.apache.tika.parser.txt.CharsetDetector;
org.apache.tika.parser.txt.CharsetMatch;

Last time I rised an version from Tika 1.0 to Tika 1.20. Then it started to throwing warnings like:

WARN  org.apache.tika.parser.SQLite3Parser : org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.

I don't need these dependencies in my app so I tried to avoid it in following ways: 0. Created an tika-config.xml file

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <service-loader initializableProblemHandler="ignore"/>
</properties>
  1. Added to application.yaml tika.config property with relative and not-relative path to tika-config.xml file. Didn't worked.
  2. Added an TIKA_CONFIG enviroment variable. Also didn't worked.

Is there any other solution that can I try to get rid of these warnings?


Solution

  • the reason you have this warning is because the sqlite is no longer embeded with tika jar https://cwiki.apache.org/confluence/display/tika/SQLite%20Parser

    try exluding sql with this, or add sqlite dependency

    <?xml version="1.0" encoding="UTF-8"?>
    <properties>
      <parsers>
        <parser class="org.apache.tika.parser.DefaultParser">
          <mime-exclude>application/sql</mime-exclude>
        </parser>
      </parsers>
    </properties>
    

    if you want to add sqlite dependency

    add this to your pom.xml

    <dependency>
      <groupId>org.xerial</groupId>
      <artifactId>sqlite-jdbc</artifactId>
      <version>3.8.10.1</version> 
    </dependency>