I was trying to get all the /r/politics posts for the last two months with all the comments and the user details. How do I do this using PRAW?
Should I go through the posts in get_hot()
? Any ideas on how to go about this? Are there any time-stamp methods that I can take advantage of?
You can use the Cloudsearch syntax to search between 2 timestamps. The syntax is:
timestamp:START_UNIX_TIMESTAMP..END_UNIX_TIMESTAMP
If you are expecting less than 1000 entries to be returned, it should be relatively straightforward just to set up a search to do this. Search queries are limited to 1000 requests, though, so some special logic is required if more posts than this are expected.
To perform a search for any post in the last 2 months, try:
import time
current_timestamp = time.time()
# 60 seconds * 60 minutes * 24 hours * 60 days = 2 months
two_months_timestamp = current_timestamp - (60 * 60 * 24 * 60)
query = 'timestamp:{}..{}'.format(current_timestamp, two_months_timestamp)
results = reddit.subreddit('test').search(query, sort='new')
If you need to get more than 1000, I would suggest getting the timestamp of the last item in the search results, then storing the timestamp of that and searching for timestamp:<current_timestamp>..<last_item_timestamp>
and repeat until there are no more results.