Retrieving only entries with selftext reddit praw

I am downloading the top 100 posts in Reddit. Nevertheless, many are either external links, jpg files or other types of non-textual content. Therefore I get a list which mainly is composed of empty units. I was wondering if there is a way to retrieve only those entries that contain selftext. Here is my code:

import json import nltk import re import pandas

appended_data = []

subreddit = reddit.subreddit('bitcoin') 

top_python = subreddit.hot(limit=100) entries

for submission in top_python:
    if not submission.stickied:

        appended_data.append(submission.selftext)



str_list = list(filter(None, appended_data))

Solution

There is a built in flag for checking if something is a text post or not, is_self. The updated version of your code would look a bit like this:

import json 
import nltk 
import re 
import pandas

appended_data = []

subreddit = reddit.subreddit('bitcoin') 

top_python = subreddit.hot(limit=100) entries

for submission in top_python:
    if not submission.stickied and submission.is_self:

        appended_data.append(submission.selftext)



str_list = list(filter(None, appended_data))

If you have any further questions don't hesitate to post a comment and ask!