I'd like to detect languages in texts using langdetect. According to the documentation , I have to set a seed to get stable results.
Language detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it. To enforce consistent results, call following code before the first language detection:
As shown below, the results seems not to work. What did I miss?
from langdetect import detect, detector_factory, detect_langs
my_string = "Hi, my friend lives next to me. Can you call her? Thibault François. Envoyé depuis mon mobile"
detector_factory.seed = 42
for i in range(5):
print(detect_langs(my_string), detect(my_string))
result example:
[fr:0.7142820855500301, en:0.28571744799229243] en
[fr:0.7142837342663328, en:0.2857140098811736] en
[en:0.571427940246422, fr:0.4285710874902514] fr
[en:0.5714284102904427, fr:0.42857076299207464] fr
[en:0.5714277269187811, fr:0.4285715961184375] fr
If you use DetectorFactory
(as suggested in the documentation) instead of detector_factory
, it works.
from langdetect import detect, DetectorFactory, detect_langs
my_string = "Hi, my friend lives next to me. Can you call her? Thibault François. Envoyé depuis mon mobile"
DetectorFactory.seed = 42
for i in range(5):
print(detect_langs(my_string), detect(my_string))
result:
[en:0.5714271973455635, fr:0.42857096898887964] en
[en:0.5714271973455635, fr:0.42857096898887964] en
[en:0.5714271973455635, fr:0.42857096898887964] en
[en:0.5714271973455635, fr:0.42857096898887964] en
[en:0.5714271973455635, fr:0.42857096898887964] en