Search code examples
pythonredditpraw

Praw: How to filter search results based on created date?


I want to have a script running in the background which will fetch subreddit data every hour or so. Now since I don't want duplicate entries in my db, I want to filter my search results based on created_utc

This is what I have currently:

r = praw.Reddit(user_agent='soc')
submissions = r.get_subreddit('soccer').get_hot()

And this is what I want to have:

r = praw.Reddit(user_agent='soc')
submissions = r.get_subreddit('soccer').get_hot(created_utc > '2016-02-18 14:33:14.000')

What are the ways to achieve this?


Solution

  • Neither the SubReddit class nor the Reddit API have the date-based filter methods that you want, so here is one option for you:

    Filter the results out in Python before you put them into your DB. get_hot and get_new return generator objects, so you can use a list comprehension like this:

    from datetime import datetime, timedelta
    import praw
    
    # assuming you run this script every hour
    an_hour_ago = datetime.utcnow() - timedelta(hours=1)
    r = praw.Reddit(user_agent='soc')
    submissions = r.get_subreddit('soccer').get_new()
    submissions_list = [
        # iterate through the submissions generator object
        x for x in submissions
        # add item if item.created_utc is newer than an hour ago
        if datetime.utcfromtimestamp(x.created_utc) >= an_hour_ago
    ]
    

    By default Reddit only returns 25 listings, so if you need more than that, you'll have to paginate it.

    limit = 100  # Reddit maximum limit
    total_list = []
    submissions = r.get_subreddit('soccer').get_new(limit=limit)
    submissions_list = [
        x for x in submissions
        if datetime.utcfromtimestamp(x.created_utc) >= an_hour_ago
    ]
    total_list += submissions_list
    if len(submissions_list) == limit:
        submissions = r.get_subreddit('soccer').get_new(
            # get limit of items past the last item in the total list
            limit=100, params={"after": total_list[-1].fullname}
        )
    submissions_list_2 = [
        # iterate through the submissions generator object
        x for x in submissions
        # add item if item.created_utc is newer than an hour ago
        if datetime.utcfromtimestamp(x.created_utc) >= an_hour_ago
    ]
    total_list += submissions_list_2
    print total_list
    

    If the amount of submissions is greater than 200, you'll have to put that in a recursive function like this: subreddit_latest.py