Search code examples
pythonanalyticsredditdata-analysis

Finding low-voted posts on Reddit


The Reddit API provides information on the score of any given post, including the number of upvotes, the number of downvotes, and the overall score. I want to use this information to do some analysis of story titles and, eventually, the content to which the story links (self post, blog article, whatever) to try and predict which posts will be a hit and which will be a miss.

Reddit's API provides easy access to the highest scoring posts from any given subreddit (including r/all), but there isn't an easy way to find posts with a low score, especially given that there are different types of low scores.

For example, you could have a story that is new and has 0 ups, 0 downs, and a 0 score. Is this story a flop? Not necessarily. It's just new. However, because of the way Reddit works, a story could have 0 ups, 50 downs, and a 0 score. Chances are that this post was hateful, spam, or something otherwise meant to troll. I think I need to differentiate between these two types of stories to get a more accurate representation.

I want the top 10% and the lowest 10% of stories score wise, so if you know of a way to find the total number of stories submitted to a subreddit, I'd love to hear about it!

What is the best way to go about finding stories that have low scores? Should I just start with the front page and use a brute-force algorithm, checking the ups, downs, and score of each story as I go until I have enough data? What other variables do I need to consider?


Solution

  • What is the best way to go about finding stories that have low scores?

    The search functionality of reddit most likely is your best bet to find low scoring submissions by subreddit, or sets of subreddits. Unfortunately, it appears that neither the score, nor the number of votes (up or down) is included in the index. Perhaps if you ask this questions on /r/redditdev, you may get a favorable answer from /u/kemitche.

    Should I just start with the front page and use a brute-force algorithm, checking the ups, downs, and score of each story as I go until I have enough data?

    You may also want to contact /u/Deimorz as Deimorz has already done this [1, 2] and may be able to provide you with answers to your questions.

    I want the top 10% and the lowest 10% of stories score wise, so if you know of a way to find the total number of stories submitted to a subreddit, I'd love to hear about it!

    Unfortunately, without having monitored all submissions made to a subreddit over time, or retroactively attempting to crawl all of reddit's submissions (as Deimorz has done) the only other possible way is to ask the reddit admins directly.