I am new to dataflow. I came across this example in the google documentation.
PCollection<String> items = ...;
PCollection<String> session_windowed_items = items.apply(
Window.<String>into(Sessions.withGapDuration(Duration.standardMinutes(10))));
1) In the above example, what would be the key used by dataflow to create windows?
2) If my input source is pubsub, should I set any message attributes and how can we specify what key dataflow should use when we go with Session based windowing.
Elements are assigned to sessions at the first grouping operation after the Window.into
. Whatever key affects the GroupByKey
, Combine.perKey
, Sum.perKey
, CoGroupByKey
, etc. operation will be the grouping key.
You do not need to set message attributes to specify the key. Instead, you would write a ParDo
to transform the existing elements into KV<K, V>
values, and the key there would be derived from that.
You may want to read about group-by-key for for more info.