I am currently searching how to recognize tweets languages. I found the apache library tika but it doesn't work well... Now, I have found langdetect and I am trying to use it. Currently, I have found a sample of code but I don't understand what is the file "profiles"... I don't know what I need to put inside...
String path = "my path to the file profiles";
DetectorFactory.loadProfile(path);
detector = DetectorFactory.create();
detector.append(tweet);
langDetected = detector.detect();
From the documentation:
Before using this library, call
DetectorFactory#loadProfile()
once to initialize.
DetectorFactory.loadProfile(profileDirectory);
The parameter of this method is a directory which has files of language profiles. The language profiles are bundled with this library, so specify"trunk/profile"
in repository as the parameter ofloadProfile()
.
Profiles files are in the repository in the profiles
subdirectory