I'm getting an error when attempting to run this code:
import nltk
nltk.download('punkt')
from youtube_transcript_api import YouTubeTranscriptApi
video_id = 'wK4XmXJ299k'
transcript = YouTubeTranscriptApi.get_transcript(video_id)
corpus = ' '.join([line['text'] for line in transcript])
from transformers import pipeline
mysummarization = pipeline("summarization")
mysummary = mysummarization(corpus)
mysummary[0]['summary_text']
The code gets a transcript from a YouTube video and attempts to summarize with the Hugging Face Transformers model. The error is IndexError: index out of range in self.
I am also seeing a No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6). Using a pipeline without specifying a model name and revision in production is not recommended. Token indices sequence length is longer than the specified maximum sequence length for this model (11628 > 1024). Running this sequence through the model will result in indexing errors
message as well.
How do I fix this?
import nltk
nltk.download('punkt')
from youtube_transcript_api import YouTubeTranscriptApi
video_id = 'wK4XmXJ299k'
transcript = YouTubeTranscriptApi.get_transcript(video_id)
corpus = ' '.join([line['text'] for line in transcript[:100]]) # bcz of large text
print(corpus)
from transformers import pipeline
mysummarization = pipeline("summarization", min_length=30, max_length=90)
mysummary = mysummarization(corpus)
print(mysummary[0]['summary_text'])
Output - Aaron Rodgers is fresh out of a victory over the Super Bowl champion Los Angeles Rams and Lambeau last evening on Monday Night Football . The back-to-back NFL MVP says he's looking forward to a holiday party .