Search code examples
pythonnlpspacyconjunctive-normal-form

How to get the span of a conjunct in spacy?


I use spacy, token.conjuncts to get the conjuncts of each token.

However, the return type of the token.conjuncts is tuple, but I want to get the span type, for example:

import spacy
nlp = spacy.load("en_core_web_lg")

sentence = "I like to eat food at the lunch time, or even at the time between a lunch and a dinner"
doc = nlp(sentence)
for token in doc:
    conj = token.conjuncts
    print(conj)

#output: <class 'tuple'>

Does anyone know how to convert this tuple into span type?

Or maybe how can I directly get the span type of the conjuncts?

The reason I need span type is, I want to use the conjuncts (span) to locate the location this conjunct, for example, this conjunct belongs to which noun chunk or a split (whatever way I use to split them).

Currently, I convert the tuple to str to iterate all the splits or noun chunks to search whether or not a split/noun chunk contains this conjunct.

However, a bug exists, for example, when a conjunct (of a token) appears in more than one split/noun chunk, then there will be a problem to locate the exact split which contains that conjunct. Because I only consider the str but not the index or id of the conjunct. If I can have a span of this conjunct, then I can locate the exact location of the conjunct.

Please feel free to comment, thanks in advance!


Solution

  • token.conjuncts returns a tuple of tokens. To get a span, call doc[conj.i: conj.i+1]

    import spacy
    
    nlp = spacy.load('en_core_web_sm')
    
    
    sentence = "I like oranges and apples and lemons."
    
    
    doc = nlp(sentence)
    
    for token in doc:
        if token.conjuncts:
            conjuncts = token.conjuncts             # tuple of conjuncts
            print("Conjuncts for ", token.text)
            for conj in conjuncts:
                # conj is type of Token
                span = doc[conj.i: conj.i+1]        # Here's span
                print(span.text, type(span))