Lemma of puncutation in spacy

I'm using spacy for some downstream tasks, mainly noun phrase extraction. My texts contain a lot of parentheses, and while applying the lemma, I noticed all the punctuation that doesn't end sentences becomes --:

import spacy

nlp = spacy.load("de_core_news_sm")
doc = nlp("(Das ist ein Test!)")
for token in doc:
    print(f"Text: '{token.text}', Lemma: '{token.lemma_}'")

Output:

Text: '(', Lemma: '--'
Text: 'Das', Lemma: 'der'
Text: 'ist', Lemma: 'sein'
Text: 'ein', Lemma: 'ein'
Text: 'Test', Lemma: 'Test'
Text: '!', Lemma: '--'
Text: ')', Lemma: '--'

Is that normal, and if yes, why, and what can I do to keep the parentheses?

I'm on 3.7.4 with Python 3.11

Solution

I can confirm the issue with German, but when I try the equivalent sentence in Dutch the ( and ) are kept as lemma instead of --. So this is something particular in the German model.

You can override the default lemmata if you want:

import spacy
nlp = spacy.load("de_core_news_sm")
nlp.get_pipe("attribute_ruler").add([[{"TEXT": "("}]], {"LEMMA": "("})
nlp.get_pipe("attribute_ruler").add([[{"TEXT": ")"}]], {"LEMMA": ")"})

doc = nlp("(Das ist ein Test!)")
print(doc.text)
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)

Result:

(Das ist ein Test!)
( ( PUNCT punct
Das der PRON sb
ist sein AUX ROOT
ein ein DET nk
Test Test NOUN pd
! -- PUNCT punct
) ) PUNCT punct