Search code examples
scalaclojuretransducertransducer-machines

What are the similarities and differences between Scala Transducers and Clojure Transducers?


Paul Chiusano and Rúnar Óli have written a fantastic book Functional programming in Scala. In it they mention a little-referenced concept in the Scala community - Transducers.

Scala Transducers in the book Functional Programming In Scala

In the Clojure Community - Transducers get a little more press.

My question is: What are the similarities and differences between Scala Transducers **(from the book Functional Programming in Scala) and Clojure Transducers?**

Assumptions:

I'm aware that

  1. Transducers are common parlance from their concept in Electrical Engineering

  2. There is a pre-existing concept in Computer Science called a Finite State Transducer

  3. There is a precedent in Biology and Psychology adopting the word transduction

  4. There is already a history of other technical books like SICP adopting the word Transducers.


Solution

  • The stream transducers from the book Function Programming in Scala (FPiS) and Clojure's transducers are quite similar. They are a generalisation of the idea of having a "machine" (step function) to process the input stream into the output stream. FPiS's transducers are called Processes. Rich Hickey also uses the term process in his introductory talk on transducers in Clojure.

    Origins

    The design of FPiS's transducers is based on Mealy machines. Mealy machines are said to have:

    transition function T : (S, I) -> S
    output function     G : (S, I) -> O
    

    These functions can be fused together to form:

    step: (S, I) -> (S, O)
    

    Heres it's easy to see that the step function operates on the current state of the machine and the next input item to produce the next state of the machine and output item.

    One of the combinators from FPiS uses such a step function:

    trait Process[I, O] {
      ...
      def loop[S, I, O](z: S)(f: (I,S) => (O,S)): Process[I, O]
      ...
    }
    

    This loop function is essentially the seeded left reduction that Rickey talks about in this slide.

    Context agnostic

    Both can be used in many different context (such as lists, streams, channels, etc.).

    In FPiS transducers, a process type is:

    trait Process[I, O]
    

    All it knows about are it's input elements and it's output elements.

    In Clojure, it's a similar story. Hickey calls this "fully decoupled".

    Composition

    Both types of transducers can be composed.

    FPiS uses a "pipe" operator

    map(labelHeavy) |> filter(_.nonFood)
    

    Clojure uses comp

    (comp
      (filtering non-food?)
      (mapping label-heavy))
    

    Representation

    In Clojure:

    reducer:    (whatever, input) -> whatever
    transducer: reducer -> reducer
    

    In FPiS:

    // The main type is
    trait Process[I, O]
    
    // Many combinators have the type
    Process[I, O] ⇒ Process[I, O]
    

    However, FPiS's representation isn't just a function under the hood. It's a case-class (algebraic data type) with 3 variants: Await, Emit, and Halt.

    case class Await[I,O](recv: Option[I] => Process[I,O])
    case class Emit[I,O](head: O, tail: Process[I,O]
    case class Halt[I,O]() extends Process[I,O]
    
    • Await plays the part of the reducer->reducer function from Clojure.
    • Halt plays the part of reduced in Clojure.
    • Emit stands in stead of calling the next step function in Clojure.

    Early termination

    Both support early termination. Clojure does it using a special value called reduced which can be tested for via the reduced? predicate.

    FPiS uses a more statically typed approach, a Process can be in one of 3 states: Await, Emit or Halt. When a "step function" returns a process of state Halt, the processing function knows to stop.

    Efficiency

    In some points they are again similar. Both types of transducers are demand-driven and don't generate intermediate collections. However, I'd imagine that FPiS's transducers are not as efficient when pipelined/composed as the internal representation is more than "just a stack of function calls" as Hickey puts it. I'm only guessing here about the efficiency/performance though.

    Look into fs2 (previously scalaz-stream) for a perhaps more performant library that is based on the design of the transducers in FPiS.

    Example

    Here's an example of filter in both implementations:

    Clojure, from Hickey's talk slides:

    (defn filter
      ([pred]
        (fn [rf]
          (fn
            ([] (rf))
            ([result] (rf result))
            ([result input]
              (if (prod input)
                (rf result input)
                result)))))
      ([pred coll]
        (sequence (filter red) coll)))
    

    In FPiS, here's one way to implement it:

    def filter[I](f: I ⇒ Boolean): Process[I, I] =
      await(i ⇒ if (f(i)) emit(i, filter(f))
                else filter(f))
    

    As you can see, filter is built up here from other combinators such as await and emit.

    Safety

    There are a number of places where you have to be careful when implementing Clojure transducers. This seems to be a design trade-off favouring efficiency. However, this downside would seem to effect mostly library producers rather than end-users/consumers.

    • If a transducer gets a reduced value from a nested step call, it must never call that step function again with input.
    • Transducers that require state must create unique state and may not be aliased.
    • All step functions must have an arity-1 variant that does not take an input.
    • A transducer's completion operation must call its nested completion operation, exactly once, and return what it returns.

    The transducer design from FPiS favours correctness and ease of use. The pipe composition and flatMap operations ensure that completion actions occur promptly and that errors are handled appropriately. These concerns are not a burden to implementors of transducers. That said, I imagine that the library may not be as efficient as the Clojure one.

    Summary

    Both Clojure and FPiS transducers have:

    • similar origins
    • the ability to be used in different contexts (list, streams, channels, file/network io, database results)
    • demand-driven / early termination
    • finalisation/completion (for resource safety)
    • tasty :)

    They differ somewhat in their underlying representation. Clojure style transducers seem to favour efficiency whereas FPiS transducers favour correctness and compositionality.