Search code examples
pythonnlpnltksentiment-analysisnaivebayes

PoS Implementation with Naive Bayes Sentiment Analysis


I am trying to apply Sentiment Analysis (predicting negative and positive tweets) on a relatively large Dataset (10000 rows). So far, I achieved only ~73% accuracy using Naive Bayes and my method called "final" shown below to extract features. I want to add PoS to help with the classification, but am completely unsure how to implement it. I tried writing a simple function called "pos" (which I posted below) and attempted using the tags on my cleaned dataset as features, but only got around 52% accuracy this way.. Can anyone lead me in the right direction to implement PoS for my model? Thank you.

def pos(word):
 return [t for w, t in nltk.pos_tag(word)]


def final(text):

   """
   I have code here to remove URLs,hashtags, 
   stopwords,usernames,numerals, and punctuation.
   """

   #lemmatization
   finished = []
   for x in clean:
      finished.append(lem.lemmatize(x))

   return finished

Solution

  • You should first split the tweets into sentences and then tokenize. NLTK provides a method for this.

       from nltk.tokenize import sent_tokenize
       sents = sent_tokenize(tweet)
    

    After this supply this list of sentences to your nltk.pos_tag method. That should give accurates POS tags.