the lauguage code format of pytube caption seems to be changed.
from pytube import YouTube
video_link = r'https://www.youtube.com/watch?v=w7daiJHfjoY'
yt = YouTube(video_link)
print(yt.captions)
The result now looks like this:
{'a.de': <Caption lang="German (auto-generated)" code="a.de">, 'de.CcQ45jRV4-E': <Caption lang="German - deutsch" code="de.CcQ45jRV4-E">}
before I can extract subtitle simply by
yt.captions.get_by_language_code('de')
but because now the language code of caption becomes de.CcQ45jRV4-E, I need to use yt.captions.get_by_language_code('de.CcQ45jRV4-E')
Although it works, I don't know whether this language code is fixed or not.
how can I use string wildcard to get the subtitle I want in Caption? Something like:
yt.captions.get_by_language_code('de*')
Thank you.
Iterate over the captions:
from pytube import YouTube
video_link = r'https://www.youtube.com/watch?v=w7daiJHfjoY'
yt = YouTube(video_link)
for c in yt.captions:
if "de." in c.code:
caption = c
break
print(caption)
This assumes that there always will be a dot after "de". For more more complex matching, use regex, but i don't think it's necessary.