Search code examples
javanlpstanford-nlptext-mininglinguistics

Text Processing Tools for German and Spanish Languages


I'm trying to process text in German and Spanish languages. Working on English text is straight forward because of myriad NLP packages on this language. But it's not easy for other languages. I Found some packages for German text but I don't know which one is more accurate. Also, It's more difficult to find NLP package for Spanish text considering that there are some special characters in this language. Some steps that I need to do on the text are: Sentence Splitting, Tokenizing, Pos tagging and Stemming. In other words, I am looking for something that works on one or both of these two languages in Java.

Any information on this topic is appreciated..


Solution

  • I can recommend you Freeling, check its Freeling_online_demo, it includes Sentence Splitting, Tokenizing, Pos tagging and other functionalities for several language. I dont know how good it's for german but for analyze spanish is the best tool I know. I've just used Freeling via python+command line, but there are interfaces for java too, for example Freeling_jaVa_API.

    Good luck!