Search code examples
spacypos-tagger

SpaCy: How to manually set POS tag for vertical bar "|"?


When text is tagged by SpaCy, the vertical bar "|" is assigned different POS tags depending on the context, such as "ADV" , "DEL"... While I want "|" to be recognized as "PUNC". Is there a way to force this POS for "|" ?

I tried this command and it didn't work.

nlp.tokenizer.add_special_case('|', [{ORTH: '|', POS: PUNC}])

Solution

  • I would add a simple pipe into the pipeline, right after the tagger :

    def pos_postprocessor_pipe(doc) :
        for token in doc :
            if token.text == '|':
                token.pos_ = 'PUNCT'
        return doc
    
     nlp = spacy.load("en_core_web_sm")
     nlp.add_pipe(pos_postprocessor_pipe, name="pos_postprocessor", after='tagger')