Search code examples
pythonpython-3.xpraw

Python times out after if in for loop


F.W. This isn't just a PRAW question, it leans toward Python more than PRAW. Python people are welcome to contribute, and please note this is not my mother language xD!

Essentially, I'm writing a Reddit bot using the PRAW that does the following:

  • Loop through "unsaved" posts
  • Loop through the comments of said posts (targeting subcomments)
  • If the comment contains "!completed", is written by the submitter OR is a moderator, and the parent comment is not by submitter:
  • Do etc., e.x. print("Hey") No, I didn't explain that too well. Examples are better, so here xD:

Use cases:

- Post by @dudeOne
 - Comment by @dudeTwo
  - Comment with "!completed" by @dudeOne
- Post by @dudeOne
 - Comment by @dudeTwo
  - Comment with "!completed" by @moderatorOne

print("Hey"), and:

- Post by @dudeOne
 - Comment by @dudeOne
  - Comment with "!completed" by @dudeOne

... does nothing, maybe even removes + messages @dudeOne.

Here's my messy code (xD):

import praw
import os
import re

sub = "RedditsQuests"

client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
password = os.environ.get('pass')

reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     password=password,
                     user_agent='r/RedditsQuests bot',
                     username='TheQuestMaster')

for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            if ((("!completed" in comment.body)) and ((comment.is_submitter) or ('RedditsQuests' in comment.author.moderated())) and (comment.parent().author.name is not submission.author.name)):
              print("etc...")

There's a decently-sized stack, so I've added it in this bin for your reference. To me it looks like PRAW is timing out because the if-in-for loop is taking too long. I could be wrong though!


Solution

  • The issue (as you've said) is somewhat sporadic but I've narrowed it down. As it turns out, trying to fetch the subreddits moderated by /u/AutoModerator will sometimes time out (presumably because the list is long).

    Figuring out the issue

    Here's how I found the issue. Skip this section if you're only interested in the solution.

    First, I modified your script to use try and except to catch the exception when it happened. Your traceback told me that it was happening on the line that starts with if ((("!completed" in comment.body)), specifically when fetching the subreddits that a user moderates. Here was my modified script:

    for submission in reddit.subreddit(sub).new(limit=None):
        submission.comments.replace_more(limit=None)
        if submission.saved is False:
            for comment in submission.comments.list():
                try:
                    if (
                        (("!completed" in comment.body))
                        and (
                            (comment.is_submitter)
                            or ("RedditsQuests" in comment.author.moderated())
                        )
                        and (comment.parent().author.name is not submission.author.name)
                    ):
                        print("etc...")
                except Exception:
                    print(f'Author: {comment.author} ({type(comment.author)})')
    

    And the output:

    etc...
    etc...
    Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
    etc...
    etc...
    etc...
    Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
    etc...
    etc...
    etc...
    etc...
    etc...
    etc...
    etc...
    Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
    etc...
    Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
    etc...
    etc...
    

    With this in mind I wrote a very simple 3-line script to reproduce the issue:

    import praw
    
    reddit = praw.Reddit(...)
    
    print(reddit.redditor("AutoModerator").moderated())
    

    Sometimes this script would succeed but sometimes it would fail with the same socket read timeout. Presumably the timeout happens because AutoModerator moderates so many subreddits (at least 10,000), and the Reddit API takes too long to process the request.

    Fixing the issue

    Your script tries to determine whether the redditor in question is a moderator of the subreddit. You're doing this by checking if the subreddit is in the list of the user's moderated subreddits, but you can switch this to checking if the user is in the list of the subreddit's moderators. Not only should this not time out, but you'll be saving a lot of network requests because you can just fetch the list of moderators once.

    The PRAW documentation of Subreddit shows how we can get a list of moderators of a subreddit. In your case, we can do

    moderators = list(reddit.subreddit(sub).moderator())
    

    Then, instead of checking "RedditsQuests" in comment.author.moderated(), we check

    comment.author in moderators
    

    Your code then becomes

    import praw
    import os
    import re
    
    sub = "RedditsQuests"
    
    client_id = os.environ.get("client_id")
    client_secret = os.environ.get("client_secret")
    password = os.environ.get("pass")
    
    reddit = praw.Reddit(
        client_id=client_id,
        client_secret=client_secret,
        password=password,
        user_agent="r/RedditsQuests bot",
        username="TheQuestMaster",
    )
    
    moderators = list(reddit.subreddit(sub).moderator())
    for submission in reddit.subreddit(sub).new(limit=None):
        submission.comments.replace_more(limit=None)
        if submission.saved is False:
            for comment in submission.comments.list():
                if (
                    (("!completed" in comment.body))
                    and ((comment.is_submitter) or (comment.author in moderators))
                    and (comment.parent().author.name is not submission.author.name)
                ):
                    print("etc...")
    

    In my brief testing, this script runs many times faster, since we only get the list of moderators once, rather than fetching all subreddits moderated by all users who commented.


    As an unrelated style note, instead of if submission.saved is False you should do if not submission.saved, which is the conventional way to check if a condition is false.