Search code examples
spacyspacy-3

How do I make SpaCy choose noun chunks separated by "and" or "," as one


I'm sorry about the title, I really didn't know how to phrase it, but hopefully this example will make it clear.

Basically,

For the following sentence:

Ashley and Brian are drinking water.

I want the noun chunk to be "Ashley and Brian" instead it is, "Ashley", "Brian"

Another example is:

Types of clothes include shirts, pants and trousers.

I want the noun chunk to be "shirts, pants and trousers" instead of "shirts" "pants" "trousers"

How do I solve this problem?


Solution

  • What you are describing is not a noun chunk. The conjuncts feature is closer to what you want.

    This might not work for complex sentences, but at least it'll cover your examples and typical cases.

    import spacy
    
    nlp = spacy.load("en_core_web_sm")
    
    texts = [
            "Ashley and Brian are drinking water.",
            "Types of clothes include shirts, pants and trousers.",
            ]
    
    for text in texts:
        print("-----")
        print(text)
        checked = 0
        doc = nlp(text)
        for tok in doc:
            if tok.i < checked: continue
            if tok.pos_ not in ('NOUN', 'PROPN'): continue
    
            if tok.conjuncts:
                print(doc[tok.left_edge.i:tok.right_edge.i+1])
                checked = tok.right_edge.i + 1