Search code examples
rubycurlcurb

Ruby/curl link expander method is downloading the full target url


I made a handy little link expander using curl within my ruby (Sintra) app.

  def curbexpand(link) 
    result = Curl::Easy.new(link)
    begin 
      result.headers["User-Agent"] = "..."
      result.verbose = true
      result.follow_location = true
      result.max_redirects = 3
      result.connect_timeout = 5
      result.perform
      return result.last_effective_url # Returns the final destination URL after x redirects...
    rescue
      return link
      puts "XXXXXXXXXXXXXXXXXXX Error parsing link XXXXXXXXXXXXXXXXXXXXXXXXXXX"
    end
  end

The problem I have is that some geniuses are using URL shorteners to link to .exe's and .dmg's which would be fine but it looks like my curl script above is waiting for the full response to be returned (i.e. it could be a 1GB file!) before returning the url. I don't want to use third party link expander API's as I have a significant volume of links to expand.

Anyone know how I can tweak curb to just find the url rather than waiting for the full response?


Solution

  • I've done what you want using using Net::HTTP to process "HEAD" requests, and look for redirects that way. The advantage is a HEAD will not return content, only headers.

    From the docs:

    head(path, initheader = nil) 
    
    Gets only the header from path on the connected-to host. header is a Hash like { ‘Accept’ => ‘/’, … }.
    
    This method returns a Net::HTTPResponse object.
    
    This method never raises an exception.
    
    response = nil
    Net::HTTP.start('some.www.server', 80) {|http|
      response = http.head('/index.html')
    }
    p response['content-type']
    

    Combine that with the example in the Net::HTTP docs for following redirection, and you should be able to find your landing URL.

    You can probably use Curl::http_head to accomplish much the same thing.