python dataframe machine-learning nlp data-science

Anyone have a way to tokenize a paragraph, put each sentence into a pandas data frame, and perform sentiment analysis on each?

Beginner NLP/python programmer. Title says it all. I basically need a code that will tokenize a paragraph, perform sentiment analysis on each sentence put each sentence along with it's rating on a pandas data frame. I already have code that can tokenize a paragraph and even perform sentiment analysis, but I'm struggling with putting both into a data frame. Thus far, I have:

I used newspaper3k to extract the url and text.

from newspaper import fulltext
import requests
url = "https://www.click2houston.com/news/local/2021/06/18/houston-water-wastewater-proposed-increase-this-is-what-mayor-sylvester-turner-wants-you-to-know/"
text = fulltext(requests.get(url).text)

Then I used the BERT extractive summarizer to summarize the article text.

models = Summarizer()
result = models(text, min_length=30)
full = "".join(result)
type(full)

Then I tokenized the summary into sentences using nltk.

tokens=sent_tokenize(full)
print(type(np.array(tokens)[0]))

Lastly, I put it into a basic dataframe.

df = pd.DataFrame(np.array(tokens), columns=['sentences'])

The only thing I'm missing is the sentiment analysis. I simply need a sentiment analysis (preferably from BERT) rating on each sentence implemented into the data frame.

Solution

Huggingface allows you to do what you want

from transformers import pipeline
from newspaper import fulltext
import requests
import pandas as pd
import numpy as np
url = "https://www.click2houston.com/news/local/2021/06/18/houston-water-wastewater-proposed-increase-this-is-what-mayor-sylvester-turner-wants-you-to-know/"
text = fulltext(requests.get(url).text)
texts = [item.strip() for item in text.split('\n')[:10] if item.strip()]
summarizer = pipeline("summarization")
sentiment_analyser = pipeline('sentiment-analysis')
sumerize = lambda text:simmarizer(text, min_length=5, max_length=30)
sentiment_analyse = lambda sentiment_analyser:snt(text)
df = pd.DataFrame(np.array(texts), columns=['lines'])
df['Summarized'] = df.lines.apply(summarizer)
df['Sentiment'] = df.lines.apply(sentiment_analyser)
print(df.head())