Consider the following code:
final String[] texts = {
"Allons, enfants de la Patrie, Le jour de gloire est arrivé",
"O Tannenbaum, o Tannenbaum, wie treu sind deine Blätter!",
"..."
};
final LanguageDetector ld = new OptimaizeLangDetector(); // or e.g. OpenNLPDetector
ld.loadModels();
Arrays.stream(texts).parallel().forEach(text -> System.out.println(ld.detect(text)));
Can I assume that ld.detect()
and ld.detectAll()
are thread-safe and can be ran in parallel on multiple texts using a single LanguageDetector
instance?
The thing that makes me worry is that LanguageDetector
has methods like addText()
, hasEnoughText()
and reset()
which make it stateful, and therefore - by definition - non-thread-safe...
https://tika.apache.org/2.7.0/api/org/apache/tika/language/detect/LanguageDetector.html
A requirement for a class to be thread-safe, is that it is immutable. That means after construction, instance methods are not allowed to change any members.
When reading the source for org.apache.tika.langdetect.optimaize.OptimaizeLangDetector
here
we'll see this instance method
public void reset() {
writer.reset();
}
which is changing member
private CharArrayWriter writer;
and with that the state of the OptimaizeLangDetector
instance. Hence OptimaizeLangDetector
is not thread-safe.