I need to grab the top comments in a subreddit, from all time.
I have tried grabbing all the submissions, and iterating through them, but unfortunately the amount of posts you can get is limited to 1000.
I have tried using Subreddit.get_comments
, but it returns only 25 comments.
So I am looking for a way around that.
Can you help me out?
It is possible to use get_comments
with a parameter of limit
set to None
to get all available comments. (By default, it uses the amount for the account, which is usually 25). (The parameters that are used for get_comments
include the ones for get_content
, including limit
).
However, this probably won't do what you want – get_comments
(or more specifically /r/subreddit/comments
) only offers a list of new comments or new gilded comments, not top comments. And since get_comments
also capped to 1000 comments, you'll have trouble building a full list of top comments.
So what you really want is the original algorithm – getting the list of top submissions and then the top comments of those. It's not the perfect system (a low-scoring post might actually have a highly voted comment), but it's the best possible.
Here's some code:
import praw
r = praw.Reddit(user_agent='top_comment_test')
subreddit = r.get_subreddit('opensource')
top = subreddit.get_top(params={'t': 'all'}, limit=25) # For a more potentially accurate set of top comments, increase the limit (but it'll take longer)
all_comments = []
for submission in top:
submission_comments = praw.helpers.flatten_tree(submission.comments)
#don't include non comment objects such as "morecomments"
real_comments = [comment for comment in submission_comments if isinstance(comment, praw.objects.Comment)]
all_comments += real_comments
all_comments.sort(key=lambda comment: comment.score, reverse=True)
top_comments = all_comments[:25] #top 25 comments
print top_comments