Search code examples
scalaakkaakka-stream

Akka Stream - Parallel Processing with Partition


I'm looking for a way to implement/use Fan-out which takes 1 input, and broadcast to N outputs parallel, the difference is that i want to partition them.

Example: 1 input can emit to 4 different outputs, and other input can emit to 2 others outputs, depends on some function f

source ~> partitionWithBroadcast // Outputs to some subset of [0,3] outputs
partitionWithBroadcast(0) ~> ...
partitionWithBroadcast(1) ~> ...
partitionWithBroadcast(2) ~> ...
partitionWithBroadcast(3) ~> ...

I was searching in the Akka documentation but couldn't found any flow which can be suitable

any ideas?


Solution

  • What comes to mind is a FanOutShape with filters attached to each output. NOTE: I am not using the standard Partition operator because it emits to just 1 output. The question asks to emit to any of the connected outputs. E.g.:

    def createPartial[E](partitioner: E => Set[Int]) = {
      GraphDSL.create[FanOutShape4[E,E,E,E,E]]() { implicit builder =>
        import GraphDSL.Implicits._
    
        val flow = builder.add(Flow.fromFunction((e: E) => (e, partitioner(e))))
        val broadcast = builder.add(Broadcast[(E, Set[Int])](4))
    
        val flow0 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(0)).map(_._1))
        val flow1 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(1)).map(_._1))
        val flow2 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(2)).map(_._1))
        val flow3 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(3)).map(_._1))
    
        flow.out ~> broadcast.in
        broadcast.out(0) ~> flow0.in
        broadcast.out(1) ~> flow1.in
        broadcast.out(2) ~> flow2.in
        broadcast.out(3) ~> flow3.in
    
        new FanOutShape4[E,E,E,E,E](flow.in, flow0.out, flow1.out, flow2.out, flow3.out)
      }
    }
    

    The partitioner is a function that maps an element from upstream to a tuple having that element and a set of integers that will activate the corresponding output. The graph calculates the desired partitions, then broadcasts the tuple. A flow attached to each of the outputs of the Broadcast selects elements that the partitioner assigned to that output.

    Then use it e.g. as:

    implicit val system: ActorSystem = ActorSystem()
    implicit val ec = system.dispatcher
    
    def partitioner(s: String) = (0 to 3).filter(s(_) == '*').toSet
    
    val src = Source(immutable.Seq("*__*", "**__", "__**", "_*__"))
    
    val sink0 = Sink.seq[String]
    val sink1 = Sink.seq[String]
    val sink2 = Sink.seq[String]
    val sink3 = Sink.seq[String]
    
    def toFutureTuple[X](f0: Future[X], f1: Future[X], f2: Future[X], f3: Future[X]) = f0.zip(f1).zip(f2).map(t => (t._1._1,t._1._2,t._2)).zip(f3).map(t => (t._1._1,t._1._2,t._1._3,t._2))
    
    val g = RunnableGraph.fromGraph(GraphDSL.create(src, sink0, sink1, sink2, sink3)((_,f0,f1,f2,f3) => toFutureTuple(f0,f1,f2,f3)) { implicit builder =>
      (in, o0, o1, o2, o3) => {
        import GraphDSL.Implicits._
    
        val part = builder.add(createPartial(partitioner))
    
        in ~> part.in
        part.out0 ~> o0
        part.out1 ~> o1
        part.out2 ~> o2
        part.out3 ~> o3
    
        ClosedShape
      }
    })
    
    val result = Await.result(g.run(), 10.seconds)
    println("0: " + result._1.mkString(" "))
    println("1: " + result._2.mkString(" "))
    println("2: " + result._3.mkString(" "))
    println("3: " + result._4.mkString(" "))
    
    // Prints:
    //
    // 0: *__* **__
    // 1: **__ _*__
    // 2: __**
    // 3: *__* __**