Search code examples
javawekasnowball

Weka Snowball not working


I'm trying to create an Italian text classifier with Weka using Weka's StringToWordVector to create the features.

The classifier works fine, but I set a stemmer as an option of the filter and it doesn't work. This is my code:

SnowballStemmer sb=new SnowballStemmer();
snowball.setStemmer("italian");

StringToWordVector str2Words = new StringToWordVector();
String[] options_wordVector = { /*other options*/};
str2Words.setOptions(options_wordVector); 
str2Words.setStemmer(sb);

When I start debugging in the console log appears

Stemmer 'italian' unknown!

I tried sb.stem(string) too, but the same message appears and the result string is the starting one.

How can I make it work?


Solution

  • Solved.

    I misunderstood Weka's stemmers docs: in weka.jar it exists the weka.core.stemmers package, but it contains only the wrapper class.

    The Snowball classes are not included, they only have to be present in the classpath. The reason for this is, that the Weka team doesn't have to watch out for new versions of the stemmers and update them.

    The code in the question actually works after including the stemmers in the classpath.