Search code examples
pythonjupyteryoutube-data-apigoogle-api-python-clientkeyerror

KeyError: “likeCount" and IndexError: list index out of range in Python


I am using Python jupyter notebook to get stats like total likes count, dislikes count, total views for all the videos (~14k videos) for some particular you tube channel. I have used this code which I found in Gitbub. I was able to run the code till last section when I try to run below lines of code, I am getting the error message “KeyError: 'commentCount' “. Same with “likeCount”, “dislikeCount” etc..(Pls open the URL for error)

for i in range(len(allVideos)):
    i += 1
    title.append((allVideos[i])['snippet']['title'])
    publishedDate.append((allVideos[i])['snippet']['publishedAt'])
    video_description.append((allVideos[i])['snippet']['description'])
    liked.append(int((stats[i])['statistics']['likeCount']))
    disliked.append(int((stats[i])['statistics']['dislikeCount']))
    views.append(int((stats[i])['statistics']['viewCount']))
    comment.append(int((stats[i])['statistics']['commentCount']))
    videoid.append(allVideos[i]['snippet']['resourceId']['videoId'])

KeyError: 'commentCount

I understand that this problem could be due to; when comments, likes and dislikes section disabled for some videos. How do I fix this problem ?

I commented some of the above mentioned metrics line and reran the code. I ended up getting the below error message “IndexError: list index out of range”

for i in range(len(allVideos)):
    i += 1
    title.append((allVideos[i])['snippet']['title'])
    #publishedDate.append((allVideos[i])['snippet']['publishedAt'])
    #video_description.append((allVideos[i])['snippet']['description'])
    #liked.append(int((stats[i])['statistics']['likeCount']))
    #disliked.append(int((stats[i])['statistics']['dislikeCount']))
    #views.append(int((stats[i])['statistics']['viewCount']))
    #comment.append(int((stats[i])['statistics']['commentCount']))
    #videoid.append(allVideos[i]['snippet']['resourceId']['videoId'])

IndexError: list index out of range

Any smart people around who can help me with this. I tried different ways to fix it but unsuccessful ??


Solution

  • You get list index out of range because you use i += 1. You don't need it.

    You could also learn to use for-loop without range(len(...)) like

    for video in allVideos:
        title.append( video['snippet']['title'] )
        publishedDate.append( video['snippet']['publishedAt'] )
    

    If you need number in for-loop then you can use enumerate

    for number, video in enumerate(allVideos):
        title.append( video['snippet']['title'] )
        publishedDate.append( video['snippet']['publishedAt'] )
    
        comment.append( int(stats[number]['statistics']['commentCount']) )
        liked.append( int(stats[number]['statistics']['likeCount']) )
    

    but in your code you could do it simpler with zip()

    for video, stat in zip(allVideos, stats):
        title.append( video['snippet']['title'] )
        publishedDate.append( video['snippet']['publishedAt'] )
    
        comment.append( int(stat['statistics']['commentCount']) )
        liked.append( int(stat['statistics']['likeCount']) )
    

    You could make it even more readable if you first get ['snippet'] and ['statistics'] and assign to variables

    for video, stat in zip(allVideos, stats):
    
        v = video['snippet']
    
        title.append( v['title'] )
        publishedDate.append( v['publishedAt'] )
    
        s = stat['statistics']
    
        comment.append( int(s['commentCount']) )
        liked.append( int(s['likeCount']) )
    

    If you get KeyError then you should use if/else

        s = stat['statistics']
    
        if 'commentCount' in s:
            comment.append( int(s['commentCount']) )
        else:
            comment.append( 0 )
    
        if 'likeCount' in s:
            liked.append( int(s['likeCount']) )
        else:
            liked.append( 0 )
    

    or shorter using .get(key, default value)

        s = stat['statistics']
    
        comment.append( int(s.get('commentCount', 0)) )
        liked.append( int(s.get('likeCount', 0)) )
    

    Full version:

    for video, stat in zip(allVideos, stats):
    
        v = video['snippet']
    
        title.append( v['title'] )
        publishedDate.append( v['publishedAt'] )
        video_description.append( v['description'] )
        videoid.append( v['resourceId']['videoId'] )
    
        s = stat['statistics']
    
        liked.append( int(s.get('likeCount',0)) )
        disliked.append( int(s.get('dislikeCount',0)) )
        views.append( int(s.get('viewCount',0)) )
        comment.append( int(s.get('commentCount',0)) )
    

    but if you want to add it to DataFrame then you could make it simpler without all these lists title, etc.

    all_rows = []
    
    for video, stat in zip(allVideos, stats):
    
        v = video['snippet']
        s = stat['statistics']
    
        row = [
            v['title'],
            v['resourceId']['videoId'],
            v['description'],
            v['publishedAt'],
       
            int(s.get('likeCount',0)),
            int(s.get('dislikeCount',0)),
            int(s.get('viewCount',0)),
            int(s.get('commentCount',0)),
        ]
    
        all_rows.append(row)
    
    # - after `for`-loop -
    
    df = pd.DataFrame(
             all_rows, 
             columns=['title', 'videoIDS', 'video_description', 'publishedDate', 'likes', 'dislikes', 'views', 'comment']
         )
    

    EDIT:

    Full working code:

    from googleapiclient.discovery import build
    import pandas as pd
    
    youTubeApiKey = "AIzaSy..."
    youtube = build('youtube', 'v3', developerKey=youTubeApiKey)
    
    snippets = youtube.search().list(part="snippet", type="channel", q="nptelhrd").execute()
    print('len(snippets):', len(snippets))
    
    channel_id = snippets['items'][0]['snippet']['channelId']
    print('channel_id:', channel_id)
    
    stats = youtube.channels().list(part="statistics", id = channel_id).execute()
    print('len(stats):', len(stats))
    
    #status = youtube.channels().list(id=channel_id, part='status').execute()
    #print('len(status):', len(status))
    
    content = youtube.channels().list(id=channel_id, part='contentDetails').execute()
    print('len(content):', len(content))
    
    upload_id = content['items'][0]['contentDetails']['relatedPlaylists']['uploads']
    print('upload_id:', upload_id)
    
    all_videos = []
    next_page_token = None
    number = 0
    while True:
        number +=1 
        print('page', number)
        res = youtube.playlistItems().list(playlistId=upload_id, maxResults=50, part='snippet', pageToken=next_page_token).execute()
        all_videos += res['items']
    
        next_page_token = res.get('nextPageToken')
    
        if next_page_token is None:
            break
    print('len(all_videos):', len(all_videos))
    
    video_ids = list(map(lambda x: x['snippet']['resourceId']['videoId'], all_videos))
    print('len(video_ids):', len(video_ids))
    
    stats = []
    for i in range(0, len(video_ids), 40):
      res = youtube.videos().list(id=','.join(video_ids[i:i+40]), part='statistics').execute()
      stats += res['items']
    print('len(stats):', len(stats))
    
    all_rows = []
    
    number = 0
    for video, stat in zip(all_videos, stats):
        number +=1 
        print('row', number)
    
        v = video['snippet']
        s = stat['statistics']
    
        row = [
            v['title'],
            v['resourceId']['videoId'],
            v['description'],
            v['publishedAt'],
       
            int(s.get('likeCount',0)),
            int(s.get('dislikeCount',0)),
            int(s.get('viewCount',0)),
            int(s.get('commentCount',0)),
        ]
    
        all_rows.append(row)
    
    # - after `for`-loop -
    
    df = pd.DataFrame(all_rows, columns=['title', 'videoIDS', 'video_description', 'publishedDate', 'likes', 'dislikes', 'views', 'comment'])
    
    print(df.head())
    

    Result:

                                                   title     videoIDS  ... views comment
    0  NPTEL Awareness workshop in association with P...  TCMQ2NEEiRo  ...  7282       0
    1                            Cayley-Hamilton theorem  WROFJ15hk00  ...  3308       4
    2  Recap of matrix norms and Levy-Desplanques the...  WsO_s8dNfVI  ...   675       1
    3                  Convergent matrices, Banach lemma  PVGeabmeLDQ  ...   676       2
    4                  Schur's triangularization theorem  UbDwzSnS0Y0  ...   436       0
    
    [5 rows x 8 columns]