Search code examples
pythonspacy

Spacy add custom component with rewrite doc.text


I'm trying to create a custom component on spacy's pipeline. I want to transform my text to lower.

My code :

nlp = spacy.load('en_core_web_sm')
def lower_component(doc):
    doc.text = doc.text.lower
    return doc

nlp.add_pipe(lower_component, first=True)
print('Pipeline:', nlp.pipe_names)

doc = nlp("Hello world!")
doc

I have an

AttributeError : attribute 'text' of 'spacy.tokens.doc.Doc' objects is not writable

Do you have a solution for my problem?


Solution

  • I found ! just pass a class :

    class Lower(object):
        name = "Lower"
    
        nlp: Language
    
        def __init__(self, nlp: Language):
            self.nlp = nlp
    
        def __call__(self, doc: Doc) -> Doc:
            text = doc.text
            return self.nlp.make_doc(text.lower())
    

    and following :

    nlp.add_pipe(Lower(nlp), first=True)