Scraping Reddit using Nokogiri (429 too many requests)

I'm trying to scrape Reddit with Nokogiri, but a single run of this keeps telling me that I'm putting in too many requests.

require 'nokogiri'
require 'open-uri'
url = "https://www.reddit.com/r/all"
redditscrape = Nokogiri::HTML(open(url))

OpenURI::HTTPError: 429 Too Many Requests

Isn't this only one request? If it's not, how do I create sleep intervals for Nokogiri?

Solution

Reddit has an API

You could probably query the API for the particular sub-reddit(s) you want to scrape. Attempting to scrape all of Reddit just seems like a nightmare waiting to happen considering the high volume and the nested comments.

It looks like Reddit is blocking the ability to scrape in favor of using their public API.

barplot multiple aggregation
Start a PowerShell script in R via system2()
plot from sankeyNetwork in networkD3 does not show output (issue is not number of unique nodes)
"Target position can only be set for new windows" in chromote in R
Extract the correct data type in a PDF table
Time conversion in R
Comparing the values of a certain number previous rows with the current row
Run a single test function in R's testthat
rpart package installation in R
An efficient way to assign value based on a min-max range and category
Change output of the `purrr::map` function
osmdata_sf returns failed to perform HTTP request curl::curl_fetch_memory() error in R?
Comparing nls() to nls2() - what am I doing wrong
How to add "variables grid" below ggplot
How can I use predefined code snippets outside of code chunks in Quarto within RStudio/Posit?
Wrap text for collapse rows in KableExtra for a long table in R
Implementation of Breusch-Pagan test for random effects in plm with unbalanced panels
Finding a value of a dataset in different ones
Replicate matrix
Unexpected results after converting raster data from geographic to projected coordinate system using the terra package
How to remove rows by condition in R?
How do I add an alias for magrittr pipe from R in vscode
Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?
Reticulate with pywin32, a dependency of xlwings, not found when sourcing python script from R
Sub-subtitle in a graph made with `ggplot2`
How can I execute a statement and ignore warnings with tryCatch?
Enumerate events where n consecutive values are not NA
Serialize/deserialize a column with R and DuckDB
How can I structure my sapflow data to analyze using "TREX" package in R?
Putting multiple plots on the same page in R?