Search code examples
nlptokenizesentiment-analysis

I am looking for a dutch language tokenizer for technical product review


I am trying to find out the better text cleaning method for Dutch NLP problem. I have used dutch version for pos tags and nltk for removal of stop words. But I am not getting desired results.


Solution

  • have you tried this approach for dutch ?

    from nltk.util import ngrams
    from nltk.corpus import alpino
    print(alpino.words())
    quadgrams=ngrams(alpino.words(),4)
    for i in quadgrams:
        print(i)