Search code examples
rubyasynchronousparallel-processinghttprequest

Parallel HTTP requests in ruby


I have an array of URLs and I wan't to open each one and fetch a specific tag.
But I want to do this in parallel.

Here is the pseudocode for what I want to do:

urls = [...]
tags = []
urls.each do |url|
  fetch_tag_asynchronously(url) do |tag|
    tags << tag
  end
end
wait_for_all_requests_to_finish()

If this could be done in a nice and safe way that would be awesome.
I could use thread but it doesn't look like arrays are thread safe in ruby.


Solution

  • You can achieve thread-safety by using a Mutex:

    require 'thread'  # for Mutex
    
    urls = %w(
      http://test1.example.org/
      http://test2.example.org/
      ...
    )
    
    threads = []
    tags = []
    tags_mutex = Mutex.new
    
    urls.each do |url|
      threads << Thread.new(url, tags) do |url, tags|
        tag = fetch_tag(url)
        tags_mutex.synchronize { tags << tag }
      end
    end
    
    threads.each(&:join)
    

    It could however be counter-productive to use a new thread for every URL, so limiting the number of threads like this might be more performant:

    THREAD_COUNT = 8  # tweak this number for maximum performance.
    
    tags = []
    mutex = Mutex.new
    
    THREAD_COUNT.times.map {
      Thread.new(urls, tags) do |urls, tags|
        while url = mutex.synchronize { urls.pop }
          tag = fetch_tag(url)
          mutex.synchronize { tags << tag }
        end
      end
    }.each(&:join)