Search code examples
pythondataframedependenciesspacymatcher

SpaCy Dependency Matcher Parsing with Pandas Dataframe


I am having difficulties passing a dataframe column through the SpaCy Dependency Matcher. I attempted to modify the solution found in a pervious question, 'Spacy Dependency Parsing with Pandas dataframe' but no luck.

import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN

nlp = spacy.load("en_core_web_lg")
text = 'REPAIRED CONNECTOR ON J3 SMS. REPLACED THE PRIMARY COMPUTER.'.lower()
dep_matcher  = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
    {
        "RIGHT_ID": "action",
        "RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
    },

    {
        "LEFT_ID": "action",
        "REL_OP": ">",
        "RIGHT_ID": "component",
        "RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},     
    }]

dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)

for match in dep_matches:
    dep_pattern = match[0]
    matches = match[1]
    verb , subject = matches[0], matches[1] 
    print (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
>>>maint_action   repaired connector
>>>maint_action   replaced computer 

Passing a string, the above works perfectly. but when try passing a DF the new column returns blank.

Heres the function for DF:

import pandas as pd
    import spacy
    from spacy import displacy
    from spacy.matcher import DependencyMatcher
    from spacy.symbols import nsubj, VERB, dobj, NOUN

nlp = spacy.load("en_core_web_lg")
data = {'new':  ['repaired computer and replaced connector.', 'spliced wire on connector.', 'cycled power and reseated connectors and replaced computer on transmitter.']}

df = pd.DataFrame(data)    

dep_matcher  = DependencyMatcher(vocab = nlp.vocab)
    dep_pattern = [
        {
            "RIGHT_ID": "action",
            "RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
        },
    
        {
            "LEFT_ID": "action",
            "REL_OP": ">",
            "RIGHT_ID": "component",
            "RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},     
        }]
    
    dep_matcher.add('maint_action' , patterns = [dep_pattern])
    dep_matches = dep_matcher(doc)
def find_matches(text):
        doc = nlp(text)
        rule3_pairs = []
        for match in dep_matches:
            dep_pattern = match[0]
            matches = match[1]
            verb , subject = matches[0], matches[1] 
            A = (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
            rule3_pairs.append(A)
            return rule3_pairs
      
df['three_tuples'] = df['new'].apply(find_matches) 

I am trying to have each row that meets the pattern output the respective noun and verb combo. Such as:

|three_tuples|
|maint_action    repaired computer  replaced connector|
|maint_action    spliced wire|
|maint_action    cycled power  reseated connectors  replaced computer|

Solution

  • I have executed your code exactly as it is (the second sample) and it's already providing the results that you want (Image below).
    You have one small problem in the first code sample, you are not doing:
    doc = nlp(text)
    But I don't think that's what's causing the issue, maybe try restarting your kernel if you're using jupyter. output

    Update

    After your edit, I noticed that you had a lot of indentation errors please fix those.
    Also, you are calling the dep_matcher from outside the function not from within, that's why it won't work.
    Finally, you are breaking the loop with the return statement there. You should get the return out of the for loop if you want to get all the results.
    Here's the code that worked for me:
    def find_matches(text):
        doc = nlp(text)
        dep_matches = dep_matcher(doc)
        rule3_pairs = []
        for match in dep_matches:
            dep_pattern = match[0]
            matches = match[1]
            verb , subject = matches[0], matches[1]
            A = (nlp.vocab[dep_pattern].text, doc[verb] , doc[subject])
            rule3_pairs.append(A)
        return rule3_pairs