Search code examples
nlpspacy

Output text with specifically chosen tokens in parenthesis with Spacy


I want to print a sentence in my terminal with some specific words in curly parenthesis. For instance if I want the word in 5th and 7th position of this sentence to be parenthesised:

My important word is here and there.

The output should be:

My important word is {here} and {there}.

I want the solution to be in python and in particular with spacy. So far I managed to do a program like this:

import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('My important word is here and there.')
my_important_words = [4,6]
for token in doc:
    if token.i in my_important_words:
        print("{"+token.text+"}")
    else:
        print(token.text)

But not only my for loop displays words line by lines but also it sounds pretty verbose program to me. I cannot believe a library like spacy has not a straightforward one/twoliner way to do that.

Any solution?

PS: I know there is displacy fancy solutions for stressing words with some labeled property like this: Spacy Verb highlight?

but it is not really the same because 1) my set of words is a list of words/tokens arbitrary chosen by me 2) I do not want some displacy render html things. I just want plain print on my terminal.


Solution

  • A two liner for your use case could be:

    import re
    import spacy
    
    nlp = spacy.load('en_core_web_lg')
    doc = nlp('My important word is here and there.')
    
    my_important_words = [4,6]
    
    # First line: this basically does what you're looking for, but adds an extra space before every punctuation character...
    output_string = " ".join([token.text if token.i not in my_important_words else '{'+token.text+'}' for token in doc])
    
    # Second line: solves the 'extra space before punctuation' explained before
    output_string = re.sub(' ([@.#$\/:-?!])', r'\1', output_string)
    
    # Results
    print(output_string)
    

    The output of the previous code gets what you're looking for in the CLI:

    My important word is {here} and {there}.

    Hope it helps.