Search code examples
pythonredditpraw

Retrieving only entries with selftext reddit praw


I am downloading the top 100 posts in Reddit. Nevertheless, many are either external links, jpg files or other types of non-textual content. Therefore I get a list which mainly is composed of empty units. I was wondering if there is a way to retrieve only those entries that contain selftext. Here is my code:

import json import nltk import re import pandas

appended_data = []

subreddit = reddit.subreddit('bitcoin') 

top_python = subreddit.hot(limit=100) entries

for submission in top_python:
    if not submission.stickied:

        appended_data.append(submission.selftext)



str_list = list(filter(None, appended_data)) 

Solution

  • There is a built in flag for checking if something is a text post or not, is_self. The updated version of your code would look a bit like this:

    import json 
    import nltk 
    import re 
    import pandas
    
    appended_data = []
    
    subreddit = reddit.subreddit('bitcoin') 
    
    top_python = subreddit.hot(limit=100) entries
    
    for submission in top_python:
        if not submission.stickied and submission.is_self:
    
            appended_data.append(submission.selftext)
    
    
    
    str_list = list(filter(None, appended_data)) 
    

    If you have any further questions don't hesitate to post a comment and ask!