Search code examples
multithreadingasynchronousclojurefunctional-programmingweb-worker

Clojure: how much asynchronity is necessary in a process pipeline?


In Clojure you can build pipelines involving steps (s), some expensive (S), in a multi-step process bound together synchronously (->) or asynchronously (~>) via comp or chan respectively. I am trying to understand at what granularity channels are necessary to avoid blocking and improve performance.

We could use channels to connect every step, but this seems like unnecessary overhead.

~> s ~> s ~> S ~> s

Or we could use a single channel up front and synchronously compose the other steps. This appears to me enough to avoid blocking the main process.

~> s -> s -> S -> s

I think this is the same as:

~> S

Would one prefer more to fewer channels? Why?

I'm thinking of the second example as being similar to calling a web worker in the browser in that once one crosses a boundary, how the backend bits are connected won't much impact the main thread.


Solution

  • You don't gain performance by splitting up synchronous steps and running them synchronized via channels - no matter how many threads, workers or machines you are running the steps on. Instead, you loose performance due to the necessary coordination overhead.

    It doesn't matter how expensive the individual steps are. There is no point in granularity at which it is preferable to choose re-factoring into processes synchronized via channels.

    When a computation demands that its step are executed synchronously, i. e. one step requires the previous step to provide results, executing them synchronously is the best performing thing you can do.

    Channels are useful in situations where you desire to pause a computation until it can continue with data provided by one or more input sources. This allows non-deterministic events to be factored in like deciding to prefer one input source during runtime because it happens to be available before the other.

    As a coordination utility, channels never increase performance. Everything you can do with channels could be done with better performance without channels. But it is usually tedious and more error prone. Also, correct synchronization of multiple threads is extremely difficult without a design constraint like for example channels. That is why in most cases the slight overhead is a cheap price to pay.