Search code examples
javajarclasspathlanguage-detection

Java language detection with langdetect - how to load profiles?


I'm trying to use a Java library called langdetect hosted here. It couldn't be easier to use:

Detector detector;
String langDetected = "";
try {
    String path = "C:/Users/myUser/Desktop/jars/langdetect/profiles";
    DetectorFactory.loadProfile(path);
    detector = DetectorFactory.create();
    detector.append(text);
    langDetected = detector.detect();
} 
catch (LangDetectException e) {
    throw e;
}

return langDetected;

Except with respect to the DetectFactory.loadProfile method. This library works great when I pass it an absolute file path, but ultimately I think I need to package my code and langdetect's companion profiles directory inside the same JAR file:

myapp.jar/
    META-INF/
    langdetect/
        profiles/
            af
            bn
            en
            ...etc.
    com/
        me/
            myorg/
                LangDetectAdaptor --> is what actually uses the code above

I will make sure that the LangDetectAdaptor which is located inside myapp.jar is supplied with both the langdetect.jar and jsonic.jar dependencies it needs for langdetect to work at runtime. However I'm confused as to what I need to pass in to DetectFactory.loadProfile in order to work:

  • The langdetect JAR ships with the profiles directory, but you need to initialize it from inside your JAR. So do I copy the profiles directory and put it inside my JAR (like I prescribe above), or is there a way to keep it inside langdetect.jar but access it from inside my code?

Thanks in advance for any help here!

Edit : I think the problem here is that langdetect ships with this profiles directory, but then wants you to initialize it from inside your JAR. The API would probably benefit from being changed a little bit to just consider profiles its own configuration, and to then provide methods like DetectFactory.loadProfiles().except("fr") in the event that you don't want it to initialize French, etc. But this still doesn't solve my problem!


Solution

  • Looks like the library only accepts files. You can either change the code and try submitting the changes upstream. Or write your resource to a temp file and get it to load that.