Search code examples
hazelcast-jet

Sequential processing within a partition in hazelcast jet


Within a partition, does jet process each item sequentially, and/or is there a setting to config that way?

Thanks Shannon


Solution

  • Each source processor processes items in one external partition sequentially. For example, each Kafka partition is assigned to single processor instance, and the processor emits the data sequentially.

    However, if a downstream processor takes items from multiple upstream processors, the order is unspecified. However, items from one upstream processor can never be reordered.

    Example: let's have two vertices, A and B. A has two instances: A1, A2; B has only one instance B1. If A1 emits items I1 and I2 and A2 emits I3 and I4, B1 can receive them in any this order, but it can never receive I2 before I1 or I4 before I3. For example I3, I1, I2, I4 is possible order, but I2, I1, I3, I4 is not.

    Sequential order between two vertices will be kept in these cases:

    • both have equal local parallelism and Edge.isolated() is used.
    • both are connected to their upstream processor with partitioned edge, use the same key, have equal parallelism and both are either distributed or both are not distributed.

    Note that in these cases the downstream processor always has one upstream processor.

    Also have a look at this image (taken from here). Two Tokenize circles are two processor instances of Tokenize vertex.

    enter image description here