Search code examples
rubyenumerator

How does enumerators created with code block actually runs


It's just a simple question, how is y.<< method is able to halt the code-block mid execution ??

I have expected the code block to run only once and never halt in the middle :/

e = Enumerator.new do |y|
    puts "Ruby"
    y << 1
    y << 2
    puts "Ruby"
    y << 3
end

puts e.each.next
puts e.each.next
puts e.each.next
e.rewind
puts e.each.next
puts e.each.next
puts e.each.next

Solution

  • Almost all Ruby implementations are Free Software and Open Source, so you can just look at the source code to see how it is implemented.

    In Rubinius, the most interesting part is Enumerator::Iterator#reset, implemented in core/enumerator.rb:

    @fiber = Fiber.new stack_size: STACK_SIZE do
      obj = @object
      @result = obj.each { |*val| Fiber.yield *val }
      @done = true
    end
    

    and Enumerator::Iterator#next:

    val = @fiber.resume
    

    TruffleRuby's implementation is very similar, as you can see in src/main/ruby/truffleruby/core/enumerator.rb:

    class FiberGenerator
      # irrelevant methods omitted
    
      def next
        reset unless @fiber
    
        val = @fiber.resume
    
        raise StopIteration, 'iteration has ended' if @done
    
        val
      end
    
      def reset
        @done = false
        @fiber = Fiber.new do
          obj = @object
          @result = obj.each do |*val|
            Fiber.yield(*val)
          end
          @done = true
        end
      end
    end
    

    JRuby is also very similar, as you can see in core/src/main/ruby/jruby/kernel/enumerator.rb:

    class FiberGenerator
      # irrelevant methods omitted
    
      def next
        reset unless @fiber&.__alive__
    
        val = @fiber.resume
    
        raise StopIteration, 'iteration has ended' if @state.done
    
        val
      end
    
      def reset
        @state.done = false
        @state.result = nil
        @fiber = Fiber.new(&@state)
      end
    
    end
    

    MRuby's implementation is very similar, as you can see in mrbgems/mruby-enumerator/mrblib/enumerator.rb.

    YARV also uses Fibers, as can be seen in enumerator.c, for example here:

    static void
    next_init(VALUE obj, struct enumerator *e)
    {
        VALUE curr = rb_fiber_current();
        e->dst = curr;
        e->fib = rb_fiber_new(next_i, obj);
        e->lookahead = Qundef;
    }
    
    static VALUE
    get_next_values(VALUE obj, struct enumerator *e)
    {
        VALUE curr, vs;
    
        if (e->stop_exc)
        rb_exc_raise(e->stop_exc);
    
        curr = rb_fiber_current();
    
        if (!e->fib || !rb_fiber_alive_p(e->fib)) {
        next_init(obj, e);
        }
    
        vs = rb_fiber_resume(e->fib, 1, &curr);
        if (e->stop_exc) {
        e->fib = 0;
        e->dst = Qnil;
        e->lookahead = Qundef;
        e->feedvalue = Qundef;
        rb_exc_raise(e->stop_exc);
        }
        return vs;
    }
    

    So, not surprisingly, Enumerator is implemented using Fibers in many Ruby implementations. Fiber is essentially just Ruby's name for semi-coroutines, and of course, coroutines are a popular way of implementing generators and iterators. E.g. CPython and CoreCLR also implement generators using coroutines.

    One exception to this seems to be Opal. My assumption was that Opal would use ECMAScript Generators to implement Ruby Enumerators, but it does not look like that is the case. The implementation of Ruby Enumerators in Opal is found in opal/corelib/enumerator.rb, opal/corelib/enumerator/generator.rb, and opal/corelib/enumerator/yielder.rb with some help from opal/corelib/runtime.js, but unfortunately, I don't fully understand it. It does not appear to use either Ruby Fibers or ECMAScript Generators, though.

    By the way, your usage of Enumerators is somewhat strange: you call Enumerator#each six times without a block, but calling Enumerator#each without a block just returns the Enumerator itself:

    eachenum

    Iterates over the block according to how this Enumerator was constructed. If no block and no arguments are given, returns self.

    So, in other words, all those calls to Enumerator#each are just no-ops. It would make much more sense to just call Enumerator#next directly:

    puts e.next
    puts e.next
    puts e.next
    e.rewind
    puts e.next
    puts e.next
    puts e.next