Search code examples
pythonerror-handlingruntimereddit

How do I avoid getting a sporadic KeyError: 'data' when using the Reddit API in python?


I have the following python code that is working ok to use reddit's api and look up the front page of different subreddits and their rising submissions.

from pprint import pprint
import requests
import json
import datetime
import csv
import time

subredditsToScan = ["Arts", "AskReddit", "askscience", "aww", "books", "creepy", "dataisbeautiful", "DIY", "Documentaries", "EarthPorn", "explainlikeimfive", "food", "funny", "gaming", "gifs", "history", "jokes", "LifeProTips", "movies", "music", "pics", "science", "ShowerThoughts", "space", "sports", "tifu", "todayilearned", "videos", "worldnews"]

ofilePosts = open('posts.csv', 'wb')
writerPosts = csv.writer(ofilePosts, delimiter=',')

ofileUrls = open('urls.csv', 'wb')
writerUrls = csv.writer(ofileUrls, delimiter=',')

for subreddit in subredditsToScan:
    front = requests.get(r'http://www.reddit.com/r/' + subreddit + '/.json')
    rising = requests.get(r'http://www.reddit.com/r/' + subreddit + '/rising/.json')

    front.text
    rising.text

    risingData = rising.json()
    frontData = front.json()

    print(len(risingData['data']['children']))
    print(len(frontData['data']['children']))
    for i in range(0, len(risingData['data']['children'])):
        author = risingData['data']['children'][i]['data']['author']
        score = risingData['data']['children'][i]['data']['score']
        subreddit = risingData['data']['children'][i]['data']['subreddit']
        gilded = risingData['data']['children'][i]['data']['gilded']
        numOfComments = risingData['data']['children'][i]['data']['num_comments']
        linkUrl = risingData['data']['children'][i]['data']['permalink']
        timeCreated = risingData['data']['children'][i]['data']['created_utc']

        writerPosts.writerow([author, score, subreddit, gilded, numOfComments, linkUrl, timeCreated])
        writerUrls.writerow([linkUrl])



    for j in range(0, len(frontData['data']['children'])):
        author = frontData['data']['children'][j]['data']['author'].encode('utf-8').strip()
        score = frontData['data']['children'][j]['data']['score']
        subreddit = frontData['data']['children'][j]['data']['subreddit'].encode('utf-8').strip()
        gilded = frontData['data']['children'][j]['data']['gilded']
        numOfComments = frontData['data']['children'][j]['data']['num_comments']
        linkUrl = frontData['data']['children'][j]['data']['permalink'].encode('utf-8').strip()
        timeCreated = frontData['data']['children'][j]['data']['created_utc']

        writerPosts.writerow([author, score, subreddit, gilded, numOfComments, linkUrl, timeCreated])
        writerUrls.writerow([linkUrl])

It works well and scrapes the data accurately but it constantly gets interrupted, seemingly randomly, and has a run time crash, saying:

Traceback (most recent call last):
  File "dataGather1.py", line 27, in <module>
    for i in range(0, len(risingData['data']['children'])):
KeyError: 'data'

I have no idea why this error is occuring on and off and not consistently. I thought maybe I am calling the API too much so it stops me from accessing it so I threw a sleep in my code but that did not help. Any ideas?


Solution

  • When there are no data on the response from the API there are is no key data on the dictionary so you get a keyError on some subreddits. You need to use a try catch