Search code examples
machine-learningmahout

support for other languages in mahout classifier


I am training mahout naive-bayes classifier . My training data has following nature :

 Sports --> "text from different languages but related to sports"
 Health --> "text from different languages but related to health"

In this case will mahout support data other than english? or other language text will be ignored.


Solution

  • Yes and no. The classifier is happy to operate on any strings, and does not assign meaning to them. The language is irrelevant. However it would in no way understand that "sports" and "deportes" are the same word in different languages.