Search code examples
apache-flinkflink-streaming

Flink object reuse: modify input objects?


I have a Flink streaming application that spends roughly 20% of its CPU time in Kyro.copy. I can evade that by turning on object reuse mode, but I have a slight problem: I'd like to modify input objects to my operators.

The general contract for object reuse mode seems to state: Do not modify input objects or remember input objects after returning from your map function. You may modify objects after output and re-emit them. (e.g.: Slide 6)

Now, my question is: If I immediately dispose of all references to objects after output-ing them from my operators, is it safe to modify input objects? Or is there some other combination of rules that can make it safe to modify input objects?


Solution

  • Yes, it would be safe. But note that immediate disposal also means that you cannot use them as a key in maps and that also means heap state backends (you can use it for lookup, but would need to create a copy on modification). So for simple map chains, it should work well, but before using joins, windows, and grouping, I'd double check it or create my own defensive copies at appropriate places.

    Btw, if you want to improve performance, it's almost always recommended to ditch Kryo serialization. Kryo would slow down any network traffic if you have any. If so, try to use POJOs, some well-supported formats like Avro, or write your own serializer. That would certainly improve performance more than object reuse. This paragraph does not apply if you don't have any network channels.