Search code examples
apache-kafkaapache-storm

Does Field Grouping ensure strict order?


I am a beginner with Apache Storm and wondering when the order of tuples is guaranteed in a stream. When I get this post right Processing records in order in Storm then the order between a Bolt/Spout and a other Bolt is guaranteed.

So if I have KaffkaSpout which emits Tuples which are ordered according to a timestamp and have some Bolts with field grouping according to some id.

builder.setBolt("Bolt1", bolt1).fieldsGrouping("Bolt1", new Fields("id")); 

Is it guaranteed that tuples with an id x are always processed in order for a Bolt. So Tuple1 must be processed in Bolt1 (strictly) before Tuple2 is processed in Bolt1 if they have the same id? With strictly I mean not parallel.
Is this true even when a worker node fails?


Solution

  • That depends on your topology and where does "Bolt1" lie in the topology relative to the KafkaSpout. For e.g. consider the following 2 topology cases -

    Case 1 -

    • builder.setSpout("KafkaSpout", Kafkaspout);
    • builder.setBolt("Bolt1", bolt1).fieldsGrouping("KafkaSpout", new Fields("id"));

    In this case, since bolt1 is next in topology to kafkaSpout and with field grouping, all tuples with same "id" will go to the same bolt instance, it will be strict in order. However consider the following topology

    Case 2 -

    • builder.setSpout("KafkaSpout", Kafkaspout);
    • builder.setBolt("Bolt2", bolt2).shuffleGrouping("KafkaSpout");
    • builder.setBolt("Bolt1", bolt1).fieldsGrouping("Bolt2", new Fields("id")); //id field emitted by Bolt2

    In this case, since the order is lost in Bolt2, there is no guarantee that the tuples would come to bolt 1 in the order they were pushed into Kafka partition.

    In general, if you are looking for a strict ordering of processing in Storm system, it is your responsibility to keep all the components work and emit in order. But in general this would restrict you in many ways to use the full capabilities of Storm by restricting parallelism in your code and topology.