Search code examples
pythonpandasnlpspacysentiment-analysis

Spacy Dependency Parsing with Pandas dataframe


I would like to extract noun-adjective pair for Aspect Based Sentiment Analysis using Spacy's Dependency parser on my pandas dataframe. I was trying this code on Amazon fine food reviews dataset from Kaggle: Named Entity Recognition in aspect-opinion extraction using dependency rule matching

However, something seems to be wrong the way I feed my pandas dataframe to spacy. My results are not the way I would expect them to be. Could someone help me debug this please. Thanks a lot.

!python -m spacy download en_core_web_lg
import nltk
nltk.download('vader_lexicon')

import spacy
nlp = spacy.load("en_core_web_lg")

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()


def find_sentiment(doc):
    # find roots of all entities in the text
  for i in df['Text'].tolist():
    doc = nlp(i)
    ner_heads = {ent.root.idx: ent for ent in doc.ents}
    rule3_pairs = []
    for token in doc:
        children = token.children
        A = "999999"
        M = "999999"
        add_neg_pfx = False
        for child in children:
            if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
                if child.idx in ner_heads:
                    A = ner_heads[child.idx].text
                else:
                    A = child.text
            if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
                M = child.text
            # example - 'this could have been better' -> (this, not better)
            if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
                neg_prefix = "not"
                add_neg_pfx = True
            if(child.dep_ == "neg"): # neg is negation
                neg_prefix = child.text
                add_neg_pfx = True
        if (add_neg_pfx and M != "999999"):
            M = neg_prefix + " " + M
        if(A != "999999" and M != "999999"):
            rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
    return rule3_pairs
df['three_tuples'] = df['Text'].apply(find_sentiment) 
df.head()

My result is coming like this which clearly means something is wrong with my loop: enter image description here


Solution

  • If you call apply on df['Text'], then you are essentially looping over every value in that column and passing that value to a function.

    Here, however, your function itself iterates over the same dataframe column that you are applying the function to while also overwriting the value that is passed to it early in the function.

    So I would start by rewriting the function as follows and see if it produces the intended results. I can't say for sure, as you didn't post any sample data, but this should at least move the ball forward:

    def find_sentiment(text):
        doc = nlp(text)
        ner_heads = {ent.root.idx: ent for ent in doc.ents}
        rule3_pairs = []
        for token in doc:
            children = token.children
            A = "999999"
            M = "999999"
            add_neg_pfx = False
            for child in children:
                if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
                    if child.idx in ner_heads:
                        A = ner_heads[child.idx].text
                    else:
                        A = child.text
                if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
                    M = child.text
                # example - 'this could have been better' -> (this, not better)
                if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
                    neg_prefix = "not"
                    add_neg_pfx = True
                if(child.dep_ == "neg"): # neg is negation
                    neg_prefix = child.text
                    add_neg_pfx = True
            if (add_neg_pfx and M != "999999"):
                M = neg_prefix + " " + M
            if(A != "999999" and M != "999999"):
                rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
        return rule3_pairs