I want to have a script running in the background which will fetch subreddit data every hour or so. Now since I don't want duplicate entries in my db, I want to filter my search results based on created_utc
This is what I have currently:
r = praw.Reddit(user_agent='soc')
submissions = r.get_subreddit('soccer').get_hot()
And this is what I want to have:
r = praw.Reddit(user_agent='soc')
submissions = r.get_subreddit('soccer').get_hot(created_utc > '2016-02-18 14:33:14.000')
What are the ways to achieve this?
Neither the SubReddit
class nor the Reddit API have the date-based filter methods that you want, so here is one option for you:
Filter the results out in Python before you put them into your DB. get_hot
and get_new
return generator objects, so you can use a list comprehension like this:
from datetime import datetime, timedelta
import praw
# assuming you run this script every hour
an_hour_ago = datetime.utcnow() - timedelta(hours=1)
r = praw.Reddit(user_agent='soc')
submissions = r.get_subreddit('soccer').get_new()
submissions_list = [
# iterate through the submissions generator object
x for x in submissions
# add item if item.created_utc is newer than an hour ago
if datetime.utcfromtimestamp(x.created_utc) >= an_hour_ago
]
By default Reddit only returns 25 listings, so if you need more than that, you'll have to paginate it.
limit = 100 # Reddit maximum limit
total_list = []
submissions = r.get_subreddit('soccer').get_new(limit=limit)
submissions_list = [
x for x in submissions
if datetime.utcfromtimestamp(x.created_utc) >= an_hour_ago
]
total_list += submissions_list
if len(submissions_list) == limit:
submissions = r.get_subreddit('soccer').get_new(
# get limit of items past the last item in the total list
limit=100, params={"after": total_list[-1].fullname}
)
submissions_list_2 = [
# iterate through the submissions generator object
x for x in submissions
# add item if item.created_utc is newer than an hour ago
if datetime.utcfromtimestamp(x.created_utc) >= an_hour_ago
]
total_list += submissions_list_2
print total_list
If the amount of submissions is greater than 200, you'll have to put that in a recursive function like this: subreddit_latest.py