Search code examples
python-3.xprawreddit

PRAW Requesting All Subreddit Posts: Receive 401 Error


I'm trying to grab every single post from a subreddit all the way back to it's beginning in 2010, specifically /r/nosleep.

My code for grabbing those posts is the usual:

for submission in nosleep.submissions(end=int(time.time()):

It works perfectly, I've checked my credentials they all work, and it will easily grab two years worth of posts without any issue. What happens is I run the above for loop, and at some point around the end, it returns a 401 and crashes the entire program.

I've checked and confirmed the following scenarios:

  • It will grab from 2010 to 2011, no problem with it hitting the "start" of the subreddit and thinking it's forbidden to grab posts before the subreddit began.
  • I've printed out reddit.auth.limits on each loop, and they all respond with None, so I'm not running out of request allowances.

The only "hack" around this is to split up the work into two for loops, splitting int(time.time()) into two (or more) pieces and iterating over each like this:

for submission in nosleep.submissions(start=middle, end=int(time.time())):
for submission in nosleep.submissions(end=middle):

Even then, it sometimes returns a 401. I suspect it's because of the length of time this loop is running, but I don't know. Does anyone have any suggestions for a new method, or editing the PRAW source to accommodate?


Solution

  • Try the latest development version of PRAW (pip install --upgrade https://github.com/praw-dev/praw/archive/master.zip) as this issue should be resolved.