Search code examples
streamiteratorobservabledom-eventsecmascript-next

Events vs Streams vs Observables vs Async Iterators


Currently, the only stable way to process a series of async results in JavaScript is using the event system. However, three alternatives are being developed:

Streams: https://streams.spec.whatwg.org
Observables: https://tc39.github.io/proposal-observable
Async Iterators: https://tc39.github.io/proposal-async-iteration

What are the differences and benefits of each over events and the others?

Do any of these intend to replace events?


Solution

  • There are roughly two categories of APIs here: pull and push.

    Pull

    Async pull APIs are a good fit for cases where data is pulled from a source. This source might be a file, or a network socket, or a directory listing, or anything else. The key is that work is done to pull or generate data from the source when asked.

    Async iterators are the base primitive here, meant to be a generic manifestation of the concept of a pull-based async source. In such a source, you:

    • Pull from an async iterator by doing const promise = ai.next()
    • Wait for the result using const result = await promise (or using .then())
    • Inspect the result to find out if it's an exception (thrown), an intermediate value ({ value, done: false }), or a done signal ({ value: undefined, done: true }).

    This is similar to how sync iterators are a generic manifestation of the concept of a pull-based sync value source. The steps for a sync iterator are exactly the same as the above, omitting the "wait for the result" step.

    Readable streams are a special case of async iterators, meant to specifically encapsulate I/O sources like sockets/files/etc. They have specialized APIs for piping them to writable streams (representing the other half of the I/O ecosystem, sinks) and handling the resulting backpressure. They also can be specialized to handle bytes in an efficient "bring your own buffer" manner. This is all somewhat reminiscent of how arrays are a special case of sync iterators, optimized for O(1) indexed access.

    Another feature of pull APIs is that they are generally single-consumer. Whoever pulls the value, now has it, and it doesn't exist in the source async iterator/stream/etc. anymore. It's been pulled away by the consumer.

    In general, pull APIs provide an interface for communicating with some underlying source of data, allowing the consumer to express interest in it. This is in contrast to...

    Push

    Push APIs are a good fit for when something is generating data, and the data being generated does not care about whether anyone wants it or not. For example, no matter whether someone is interested, it's still true that your mouse moved, and then you clicked somewhere. You'd want to manifest those facts with a push API. Then, consumers---possibly multiple of them---may subscribe, to be pushed notifications about such things happening.

    The API itself doesn't care whether zero, one, or many consumers subscribe. It's just manifesting a fact about things that happened into the universe.

    Events are a simple manifestation of this. You can subscribe to an EventTarget in the browser, or EventEmitter in Node.js, and get notified of events that are dispatched. (Usually, but not always, by the EventTarget's creator.)

    Observables are a more refined version of EventTarget. Their primary innovation is that the subscription itself is represented by a first-class object, the Observable, which you can then apply combinators (such as filter, map, etc.) over. They also make the choice to bundle together three signals (conventionally named next, complete, and error) into one, and give these signals special semantics so that the combinators respect them. This is as opposed to EventTarget, where event names have no special semantics (no method of EventTarget cares whether your event is named "complete" vs. "asdf"). EventEmitter in Node has some version of this special-semantics approach where "error" events can crash the process, but that's rather primitive.

    Another nice feature of observables over events is that generally only the creator of the observable can cause it to generate those next/error/complete signals. Whereas on EventTarget, anyone can call dispatchEvent(). This separation of responsibilities makes for better code, in my experience.

    But in the end, both events and observables are good APIs for pushing occurrences out into the world, to subscribers who can tune in and tune out at any time. I'd say observables are the more modern way to do this, and nicer in some ways, but events are more widespread and well-understood. So if anything was intended to replace events, it'd be observables.

    Push <-> pull

    It's worth noting that you can build either approach on top of the other in a pinch:

    • To build push on top of pull, constantly be pulling from the pull API, and then push out the chunks to any consumers.
    • To build pull on top of push, subscribe to the push API immediately, create a buffer that accumulates all results, and when someone pulls, grab it from that buffer. (Or wait until the buffer becomes non-empty, if your consumer is pulling faster than the wrapped push API is pushing.)

    The latter is generally much more code to write than the former.

    Another aspect of trying to adapt between the two is that only pull APIs can easily communicate backpressure. You can add a side-channel to push APIs to allow them to communicate backpressure back to the source; I think Dart does this, and some people try to create evolutions of observables that have this ability. But it's IMO much more awkward than just properly choosing a pull API in the first place. The flip side of this is that if you use a push API to expose a fundamentally pull-based source, you will not be able to communicate backpressure. This is the mistake made with the WebSocket and XMLHttpRequest APIs, by the way.

    In general I find attempts to unify everything into one API by wrapping others misguided. Push and pull have distinct, not-very-overlapping areas where they each work well, and saying that we should pick one of the four APIs you mentioned and stick with it, as some people do, is shortsighted and leads to awkward code.