I've a pipeline where I'm applying transformation rules(from broadcast state) on a stream of events; when I run broadcast stream and original stream in parallel without connecting, stream performance is really good, but the moment I do broadcast performance goes down drastically. How can I achieve better performance. Data passed between operators are in byte[] and data footprint is small as well.
I've attached snapshots of both scenarios:
2. I've connected the broadcast stream with the data stream for
processing in future . Note that only to measure performance of
broadcast I've made sure no records are consumed in the data
stream(top row). At the processing side of the broadcast state, i'm
only store received messages to MapState. With this setup I can get
throughput of upto ~1000 msg/sec per task manager processing 12Gb of
data in 18mins.
You've done more than simply connect the broadcast and keyed streams. Before, each event went through just one network shuffle (the rebalance, hash, and broadcast connections), and now there are four or five shuffles for each event.
Every shuffle is expensive. Try to reduce the number of times you change parallelism or use keyBy.