Search code examples
rubymultithreadingiterator

Why does #join on a Thread object work differently when called with an iterator than with a loop?


Applying #join on Thread objects inside a loop executes them sequentially.

5.times do |x|
  Thread.new {
    t= rand(1..5) * 0.25
    sleep(t)
    puts "Thread #{x}:  #{t} seconds"
   }.join
end

# Output
# Thread 0:  1.25 seconds
# Thread 1:  1.25 seconds
# Thread 2:  0.5 seconds
# Thread 3:  0.75 seconds
# Thread 4:  0.25 seconds

On the other hand, applying #join to an array of Thread objects with an iterator executes them concurrently. Why?

threads = []

5.times do |x|
  threads << Thread.new {
    t = rand(1..5) * 0.25
    sleep(t)
    puts "Thread #{x}:  #{t} seconds"
  }
end

threads.each(&:join)

# Output
# Thread 1:  0.25 seconds
# Thread 3:  0.5 seconds
# Thread 0:  1.0 seconds
# Thread 4:  1.0 seconds
# Thread 2:  1.25 seconds

Solution

  • There are several points to address here.

    When a thread starts

    Instantiating Thread with #new, #start, #fork immediately starts that thread's code. This runs concurrently with the main thread. However, when calling a thread inside a short script without 'joining' it, the main thread typically ends before the called thread has a chance to finish. To the amateur programmer, it gives the false impression that #join starts the thread.

    thread = Thread.new {
       puts "Here's a thread"
    }
    
    # (No output)
    

    Adding a short delay to the calling main thread gives the called thread a chance to finish.

    thread = Thread.new {
       puts "Here's a thread"
    }
    
    sleep(2)
    
    # Here's a thread
    

    What #join actually does

    #join blocks the main thread, and only the calling thread, until the called thread is completed. Any previously called threads are not affected; they have been running concurrently and continue to do so.

    The original examples explained

    In the first example, the loop starts a thread, and immediately 'joins' it. Since #join blocks the main thread, the loop is paused until the first thread is completed. Then the loop iterates, starts a second thread, 'joins' it, and pauses the loop once again until this thread is completed. It's purely sequential and completely negates the point of threads.

    5.times do |x|
      Thread.new {
        t= rand(1..5) * 0.25
        sleep(t)
        puts "Thread #{x}:  #{t} seconds"
       }.join                             # <--- this #join is the culprit.
    end
    

    User Solomon Slow put it best in his comment in the original post.

    It never makes sense to "join" a thread immediately after creating it. The only reason for ever creating a thread is if the caller is going to do something else while the new thread is running. In your second example, the "something else" that the caller does is, it creates more threads.

    The second example does multithreading right. The loop starts a thread, iterates, starts the next thread, iterates, and so on. Because we haven't used #join inside the loop, the main thread keeps iterating and starts all the threads.

    So how does using #join in an iterator not pose the same problem as the first example? Because these threads have already been running concurrently. Remember #join only blocks the main thread until the 'joined' thread is complete. This called thread and all other called threads have been running since the loop that created them, and they will continue to run and finish independently of the main thread and of each other. 'Joining' all threads sequentially just tells the main thread:

    • Don't continue until Thread 1 is done (but it's possible this thread, and some, all, or none of the other threads may have already finished).
    • Don't continue until Thread 2 is done (but it's possible this thread, and some, all, or none of the remaining threads may have already finished).
    • ...
    • Don't continue until Thread 5 is done (but it's possible this thread has already finished, while all remaining threads have definitely already finished).

    In effect this last line sequentially instructs the main thread to pause, but it does not hinder the called threads.

    threads.each(&:join)
    

    I also found this explanation very helpful.