Search code examples
rubymultithreadingqueuesleepworker

Ruby work distribution fails if threads are generated to fast


I ran into a problem the other day and I spent 2 hours looking for an answer at the wrong place.

In the process I stripped down the code to the version below. The Threading here will work as long as I have the sleep(0.1) in the loop creating the threads.

If the line is omitted, all threads are created - but only thread 7 will actually consume data from the queue.

With this "hack" I do have a working solution but not one I'm happy with. I'm really curious why this happens.

I am using a fairly old version of ruby under windows 2.4.1p111. However I was able to reproduce the same behavior with a new ruby 3.0.2p107 installation

#!/usr/bin/env ruby

@q = Queue.new
      
# Get all projects (would be a list of directories)
projects = [*0..100]
projects.each do |project|
  @q.push project
end

def worker(num)
  while not @q.empty?
    puts "Thread: #{num} Project: #{@q.pop}"
    sleep(0.5)
  end
end 


threads=[]
for i in 1..7 do
  threads << Thread.new { worker(i) }
  sleep(0.1) # Threading does not work without this line - but why?
end

threads.each {|thread| puts thread.join }

puts "done"

Solution

  • Fun bug! This is a race condition.

    It's not that only thread 7 is doing work it's that all threads are referencing the same variable i in memory (there is only one copy!) so since the number 7 gets written last (presumedly before any threads have started) they all read the same i==7.

    Try this worker function and see if it doesn't clear things up

    def worker(num)
      my_thread_id = Thread.current.object_id
    
      while not @q.empty?
        puts "Thread: #{num} NumObjId: #{num.object_id} ThreadId: #{my_thread_id} Project: #{@q.pop}"
        sleep(0.5)
      end
    end
    

    Notice that NumObjId is the same in all threads. They are all pointing to the same number. But the actual ThreadId we get IS different.

    If you really do need the number in each thread allocate as many numbers as threads. Something like

    ids = (1..7).to_a
    ids.each do |i|
      threads << Thread.new { worker(i) }
    end