Search code examples
pythonweb-scrapingredditpraw

How to scrape a list of saved reddit posts to a txt file using praw in Python


I'm attempting a simple scraper to dump my saved Reddit posts to a txt file and struggling to get the script to do what I want it to do.

Here's some context. The script below dumps all of my saved post IDs into a text file, each one in its own line.

import praw
import os
import sys

reddit = praw.Reddit(client_id='MY_CLIENT_id',
                     client_secret='TOP_SECRET',
                     user_agent='AGENT_HERE',
                     username='USERNAME',
                     password='PASSWORD')

#open text file
sys.stdout = open('test.txt', 'w')
# get user saved item ids
for item in reddit.user.me().saved(limit=None):
    print(item.id)
# print to file
sys.stdout.close()

This gives me a list of post IDS that looks something like this:

lkj34f
ou456d
ho34oo
5j0vr4

I can then use the below to use each of those IDs to get the actual content I want

submission = reddit.submission(id="dg23y6")
print(submission.title)
print(submission.url)

My first question is - is there a way to open the output file, read each of the lines there and pass it as the id for the submission variable?

I'm sure there's an easier way to get this of course, I have seen several existing scripts like this dumping all content into a nicely formatted HTML file, but I'm not quite there yet, so trying to resolve this challenge using my somewhat limited skillset. I think the most obvious solution would be to use print(actual.command.I.am.missing) in place of print(item.id), but no idea how to find it.

Thanks in advance!


Solution

  • Both answers that have been submitted so far have the right idea, but make a mistake in how they use PRAW. They ignore the fact that your saved items are both comments and posts. Then, they both have a line like

    submission = reddit.submission(id=item.id)
    

    This creates a PRAW Submission object by using the ID of a pre-existing object, which is either a Submission or a Comment object. In the case that it's a Submission, the new Submission object is identical to the one it's created from, so it's redundant. In the case that it's a Comment, the behavior is incorrect because you're treating a comment ID as if it's a submission ID.

    It's not clear exactly what you want to happen with comments, so I'll do it two ways. First, here's how to do it if you want to ignore saved comments (which is much like the existing answers, but with a check of type added and the redundant line removed):

    import praw
    import os
    import sys
    
    reddit = praw.Reddit(client_id='MY_CLIENT_id',
                         client_secret='TOP_SECRET',
                         user_agent='AGENT_HERE',
                         username='USERNAME',
                         password='PASSWORD')
    
    with open('test.txt', 'w') as f:
        for item in reddit.user.me().saved(limit=None):
            if isinstance(item, praw.models.Submission):
                f.write(item.id + '\n')
                f.write(item.title + '\n')
                if item.is_self:
                    f.write(item.selftext + '\n')
                else: # link post
                    f.write(item.url)
    

    And here's how to do it where you also save comments:

    import praw
    import os
    import sys
    
    reddit = praw.Reddit(client_id='MY_CLIENT_id',
                         client_secret='TOP_SECRET',
                         user_agent='AGENT_HERE',
                         username='USERNAME',
                         password='PASSWORD')
    
    with open('test.txt', 'w') as f:
        for item in reddit.user.me().saved(limit=None):
            if isinstance(item, praw.models.Submission):
                f.write(item.id + '\n')
                f.write(item.title + '\n')
                if item.is_self:
                    f.write(item.selftext + '\n')
                else: # link post
                    f.write(item.url)
            else: # comment
                f.write(item.id + '\n')
                f.write(item.body + '\n')