Search code examples
pythonnlpchunkingphrasesspacy

Python: Chunking others than noun phrases (e.g. prepositional) using Spacy, etc


Since I was told Spacy was such a powerful Python module for natural speech processing, I am now desperately looking for a way to group words together to more than noun phrases, most importantly, prepositional phrases. I doubt there is a Spacy function for this but that would be the easiest way I guess (SpacySpaCy import is already implemented in my project). Nevertheless, I'm open for any possibility of phrase recognition/ chunking.


Solution

  • Here's a solution to get PPs. In general you can get phrases using subtree.

    def get_pps(doc):
        "Function to get PPs from a parsed document."
        pps = []
        for token in doc:
            # Try this with other parts of speech for different subtrees.
            if token.pos_ == 'ADP':
                pp = ' '.join([tok.orth_ for tok in token.subtree])
                pps.append(pp)
        return pps
    

    Usage:

    import spacy
    
    nlp = spacy.load('en_core_web_sm')
    ex = 'A short man in blue jeans is working in the kitchen.'
    doc = nlp(ex)
    
    print(get_pps(doc))
    

    This prints:

    ['in blue jeans', 'in the kitchen']