Search code examples
pythonpython-3.xgoogle-apiyoutube-data-apigoogle-api-python-client

Get List of Videos that Only Have Subtitles Youtube API


I have a python code that could search videos using Youtube API.
My output goal is to retrieve videos that have subtitles/CC only, just like search filter on Youtube Web.
enter image description here

My current code:

videos = []
def get_videos_by_query(query: str, maxResults: int = 50, pageToken: str = None):
    youtube = build(YOUTUBE_API_SERVICE_NAME,
              YOUTUBE_API_VERSION,
              developerKey=DEVELOPER_KEY)
    try:
        search_response = youtube.search().list(
            part="id,snippet",
            order='date',
            maxResults=maxResults,
            pageToken=pageToken,
            q=query
            ).execute()

        for search_result in search_response.get("items", []):
            if search_result["id"]["kind"] == "youtube#video":
                videoId = search_result["id"]["videoId"]
                data = videoId
                videos.append(data)

    except Exception as e:
        print(e)

How can I achieve that?


Solution

  • According to the docs of Search.list API endpoint, for to achieve the desired filtering on the result set obtained, you should use the following parameter:

    videoCaption (string)

    The videoCaption parameter indicates whether the API should filter video search results based on whether they have captions. If you specify a value for this parameter, you must also set the type parameter's value to video.

    Acceptable values are:

    • any – Do not filter results based on caption availability.
    • closedCaption – Only include videos that have captions.
    • none – Only include videos that do not have captions.

    Consequently, do replace your call to youtube.search().list() above with the following one:

    search_response = youtube.search().list(
            part="id,snippet",
            order='date',
            type='video',
            videoCaption='closedCaption',
            maxResults=maxResults,
            pageToken=pageToken,
            q=query
            ).execute()
    

    Note that by this change, the following piece of code becomes superfluous:

    if search_result["id"]["kind"] == "youtube#video":
    

    This is because, by having type='video' in the invocation of the API endpoint, each item of the result set obtained is necessarily referring to a video.