This is a Python Program to get all the captions from youtube link:
from pytube import YouTube
yt = YouTube('https://youtu.be/5MgBikgcWnY')
captions = yt.captions.all()
for caption in captions:
print(caption)
and the output of the above program is:
<Caption lang="Arabic" code="ar">
<Caption lang="Chinese (China)" code="zh-CN">
<Caption lang="English" code="en">
<Caption lang="English (auto-generated)" code="a.en">
<Caption lang="French" code="fr">
<Caption lang="German" code="de">
<Caption lang="Hungarian" code="hu">
<Caption lang="Italian" code="it">
But I want to get only the lang and code from the above output in a dictionary pair.
{"Arabic" : "ar", "Chinese" : "zh-CN", "English" : "en",
"French : "fr", "German" : "de", "Hungarian" : "hu", "Italian" : "it"}
Thanks in Advance.
It's pretty simple
from pytube import YouTube
yt = YouTube('https://youtu.be/5MgBikgcWnY')
captions = yt.captions.all()
captions_dict = {}
for caption in captions:
# Mapping the caption name to the caption code
captions_dict[caption.name] = caption.code
If you want a one-liner
captions_dict = {caption.name: caption.code for caption in captions}
Output
{'Arabic': 'ar', 'Bangla': 'bn', 'Burmese': 'my', 'Chinese (China)': 'zh-CN',
'Chinese (Taiwan)': 'zh-TW', 'Croatian': 'hr', 'English': 'en',
'English (auto-generated)': 'a.en', 'French': 'fr', 'German': 'de',
'Hebrew': 'iw', 'Hungarian': 'hu', 'Italian': 'it', 'Japanese': 'ja',
'Persian': 'fa', 'Polish': 'pl', 'Portuguese (Brazil)': 'pt-BR',
'Russian': 'ru', 'Serbian': 'sr', 'Slovak': 'sk', 'Spanish': 'es',
'Spanish (Spain)': 'es-ES', 'Thai': 'th', 'Turkish': 'tr',
'Ukrainian': 'uk', 'Vietnamese': 'vi'}