Stanford CoreNLP - lemmas are not recognised correctly

I’m using the coreNLP tools from the command line to tag some files containing text in German. I need to get the token, pos, lemma and ner annotations. For this purpose I’m using the following command:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -filelist $dir/filelist.input -outputFormat conll --add-modules java.se.ee -ner.useSUTime 0 -outputFormatOptions word,pos,lemma,ner -outputDirectory $dir/tagged_articles -replaceExtension -props StanfordCoreNLP-german.properties

However, the lemmas I’m getting are just not right. Here is an example of a tagged file:

Auch ADV auch O

eine ART eine O

ausgereifte ADJA ausgereifte O

Technik NN technik O

kann VMFIN kann O

jedoch ADV jedoch O

an APPR a O

ihre PPOSAT ihre O

Grenzen NN grenzen O

stoßen VVINF stoßen O

The lemmas for some of those words should be: ist -> sein / Textmengen -> Textmenge / enormen -> enorm / Grenzen -> Grenze. So there is obviously something wrong and I’m wondering what it could be. Any hint is highly appreciated!

I am using the following German model: stanford-german-corenlp-2018-02-27-models.jar

According to the README file, the version of the coreNLP tools is "2018-02-27 3.9.1”

java version "10.0.1" 2018-04-17

Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)

Solution

By now, lemmas are only supported for English:

Supported human languages

You could try using a different lemmatizer and add the lemmas manually.