Search code examples
apache-tika

Tika LanguageDetection gives error 'No language detectors available'


Tika 2.2.3, simple code

public static void main(String[] args) throws IOException {
        LanguageDetector detector =LanguageDetector.getDefaultLanguageDetector();
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        detector.addText("This is english");
        LanguageResult languageResult=detector.detect();
}

Last line gives error

Exception in thread "main" java.lang.IllegalStateException: No language detectors available
    at org.apache.tika.language.detect.LanguageDetector.getDefaultLanguageDetector(LanguageDetector.java:67)

UPDATE maven dependencies

<dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>2.3.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.tika/tika-langdetect -->
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-langdetect</artifactId>
            <version>2.3.0</version>
            <type>pom</type>
        </dependency>

I cant find anything useful online to fix this problem... am I missing some model files that I must download, where?

Any tips much appreciated!


Solution

  • I tried your code and reproduced the same issue. After reading the docs, which then led to samples in github and I finally found its pom.xml to have another dependency. Then I successfully got the expected output: en: HIGH (0.999999).

            <dependency>
                <groupId>org.apache.tika</groupId>
                <artifactId>tika-langdetect-optimaize</artifactId>
                <version>2.3.0</version>
            </dependency>
    
            LanguageDetector detector = LanguageDetector.getDefaultLanguageDetector().loadModels();
            detector.addText("This is english");
            LanguageResult languageResult = detector.detect();
    

    Explanations

    LanguageDetector is abstract class, and implementations should be added as dependency.