Search code examples
pythonwildcardpytube

problem of pytube caption selection after new update


the lauguage code format of pytube caption seems to be changed.

from pytube import YouTube
video_link = r'https://www.youtube.com/watch?v=w7daiJHfjoY'
yt = YouTube(video_link)
print(yt.captions)

The result now looks like this:

{'a.de': <Caption lang="German (auto-generated)" code="a.de">, 'de.CcQ45jRV4-E': <Caption lang="German - deutsch" code="de.CcQ45jRV4-E">}

before I can extract subtitle simply by yt.captions.get_by_language_code('de')

but because now the language code of caption becomes de.CcQ45jRV4-E, I need to use yt.captions.get_by_language_code('de.CcQ45jRV4-E')

Although it works, I don't know whether this language code is fixed or not. how can I use string wildcard to get the subtitle I want in Caption? Something like: yt.captions.get_by_language_code('de*')

Thank you.


Solution

  • Iterate over the captions:

    from pytube import YouTube
    video_link = r'https://www.youtube.com/watch?v=w7daiJHfjoY'
    yt = YouTube(video_link)
    
    for c in yt.captions:
        if "de." in c.code:
            caption = c
            break
    print(caption)
    

    This assumes that there always will be a dot after "de". For more more complex matching, use regex, but i don't think it's necessary.