Search code examples
pythontry-catchpraw

Try block not catching - Am I making inadvertent internet access?


I accidentally disconnected my internet connection and received this error below. However, why did this line trigger the error?

    self.content += tuple(subreddit_posts)

Or perhaps I should ask, why did the following line not lead to a sys.exit? It seems it should catch all errors:

    try:
        subreddit_posts = self.r.get_content(url, limit=10)
    except:
        print '*** Could not connect to Reddit.'
        sys.exit()

Does this mean I am inadvertently hitting reddit's network twice?

FYI, praw is a reddit API client. And get_content() fetches a subreddit's posts/submissons as a generator object.

The error message:

Traceback (most recent call last):
  File "beam.py", line 49, in <module>
    main()
  File "beam.py", line 44, in main
    scan.scanNSFW()
  File "beam.py", line 37, in scanNSFW
    map(self.getSub, self.nsfw)
  File "beam.py", line 26, in getSub
    self.content += tuple(subreddit_posts)
  File "/Library/Python/2.7/site-packages/praw/__init__.py", line 504, in get_co
    page_data = self.request_json(url, params=params)
  File "/Library/Python/2.7/site-packages/praw/decorators.py", line 163, in wrap
    return_value = function(reddit_session, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/praw/__init__.py", line 557, in reques
    retry_on_error=retry_on_error)
  File "/Library/Python/2.7/site-packages/praw/__init__.py", line 399, in _reque
    _raise_response_exceptions(response)
  File "/Library/Python/2.7/site-packages/praw/internal.py", line 178, in _raise
    response.raise_for_status()
  File "/Library/Python/2.7/site-packages/requests/models.py", line 831, in rais
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable

The script (it's short):

import sys, os, pprint, praw

class Scanner(object):
    ''' A scanner object. '''
    def __init__(self):
        self.user_agent = 'debian.22990.myapp'
        self.r = praw.Reddit(user_agent=self.user_agent)
        self.nsfw = ('funny', 'nsfw')
        self.nsfw_posters = set()
        self.content = ()

    def getSub(self, subreddit):
        ''' Accepts a subreddit. Connects to subreddit and retrieves content.
        Unpacks generator object containing content into tuple. '''
        url = 'http://www.reddit.com/r/{sub}/'.format(sub=subreddit)
        print 'Scanning:', subreddit
        try:
            subreddit_posts = self.r.get_content(url, limit=10)
        except:
            print '*** Could not connect to Reddit.'
            sys.exit()
        print 'Constructing list.',
        self.content += tuple(subreddit_posts)
        print 'Done.'

    def addNSFWPoster(self, post):
        print 'Parsing author and adding to posters.'
        self.nsfw_posters.add(str(post.author))

    def scanNSFW(self):
        ''' Scans all NSFW subreddits. Makes list of posters.'''
#       Get content from all nsfw subreddits
        print 'Executing map function.'
        map(self.getSub, self.nsfw)
#       Scan content and get authors
        print 'Executing list comprehension.'
        [self.addNSFWPoster(post) for post in self.content]

def main():
    scan = Scanner()
    scan.scanNSFW()
    for i in scan.nsfw_posters:
        print i
    print len(scan.content)

main()

Solution

  • It looks like praw is going to lazily get objects, so when you actually use subreddit_posts is when the request gets made, which explains why it's blowing up on that line.

    See: https://praw.readthedocs.org/en/v2.1.20/pages/lazy-loading.html