Search code examples
rubycurlnet-httpopen-uri

Only OpenURI succeeds at Reddit API request


I’m making requests to the Reddit API. First, I set a subreddit top URL:

reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')

All of these correctly get the contents:

Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')

Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]

URI.open(reddit_url, 'User-Agent' => 'My agent').read

But then I try it with a URL for a specific post:

reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')

And both Net::HTTP and Open3/curl fail, getting only empty strings. URI.open continues to work, as does opening the URL in a web browser.

Why doesn’t the second request work with two of the solutions? And why does it work with URI.open, when that’s supposed to be “an easy-to-use wrapper for Net::HTTP”? What does it do differently, and how to replicate it with Net::HTTP an curl?


Solution

  • Working with your example, and focussing on Net::HTTP for simplicity, the first example doesn't work as written:

    require 'net/http'
    reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
    Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
    # => Type Error - no implicit conversion of URI::HTTPS into String
    

    Instead I used this as my starting point:

    require 'net/http'
    reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
    http = Net::HTTP.new(reddit_url.host, reddit_url.port)
    http.use_ssl = true
    result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
    puts result
    # => #<Net::HTTPOK:0x00007fc3ea8e7320>
    puts result.body.size
    # => 167,394
    

    With that working we can try the second URL. Interestingly, I get different results depending on whether I re-use the initial connection or make a new one:

    require 'net/http'
    reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
    reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
    
    http = Net::HTTP.new(reddit_url.host, reddit_url.port)
    http.use_ssl = true
    result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
    puts result
    # => #<Net::HTTPOK:0x00007f931a143390>
    puts result.body.size
    # => 174,615
    
    http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
    http_two.use_ssl = true
    result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
    puts result_two
    # => #<Net::HTTPMovedPermanently:0x00007f931a148818>
    puts result_two.body.size
    # => 0
    
    result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
    puts result_reusing_connection
    # => #<Net::HTTPOK:0x00007f931a0fb3b0>
    puts result_reusing_connection.body.size
    # => 141,575
    

    So I suspect you're getting a 301 redirect sometimes and that's causing the confusion. There's another question and answer here for how to follow redirects.