I am reading a kinesis stream using Flink. It aggregates certain event based on the time window and the key. The code does not do anything after the reduce. No data is mapped of put in the output csv. I have waited for many minutes (even when the time window is just two minutes).
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.enableCheckpointing(CommonTimeConstants.TWO_MINUTES.toMilliseconds());
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, Time.of(1, TimeUnit.MINUTES)));
Properties consumerConfig = new Properties();
consumerConfig.put(ConsumerConfigConstants.AWS_REGION, PropertyFileUtils.get("aws.region", ""));
consumerConfig.put(ConsumerConfigConstants.AWS_ACCESS_KEY_ID, PropertyFileUtils.get("aws.accessKeyId", ""));
consumerConfig.put(ConsumerConfigConstants.AWS_SECRET_ACCESS_KEY, PropertyFileUtils.get("aws.secretAccessKey", ""));
consumerConfig.put(ConsumerConfigConstants.STREAM_INITIAL_POSITION, "TRIM_HORIZON");
DataStream<APIActionLog> apiLogRecords = env.addSource(new FlinkKinesisConsumer<>(
ProjectProperties.SOURCE_ENV_PREFIX, // stream name
new StreamedApiLogRecordDeserializationSchema(),
consumerConfig));
apiLogRecords.assignTimestampsAndWatermarks(API_LOG_RECORD_BOUNDED_OUT_OF_ORDERNESS_TIMESTAMP_EXTRACTOR);
DataStream<Tuple7<String, String, String, String, Timestamp, String, Integer>> skuPlatformTsCount =
apiLogRecords.flatMap(collecting events...)
.keyBy(Key based on some parameters of the event...)
.timeWindow(TWO_MINUTES)
.reduce(adding up event parameter..., window function...)
.map(Map to get a different tuple format...);
skuPlatformTsCount.writeAsCsv("/Users/uday/Desktop/out.csv", FileSystem.WriteMode.OVERWRITE);
env.execute("Processing ATC Log Stream");
}
private static final BoundedOutOfOrdernessTimestampExtractor<APIActionLog> API_LOG_RECORD_BOUNDED_OUT_OF_ORDERNESS_TIMESTAMP_EXTRACTOR =
new BoundedOutOfOrdernessTimestampExtractor<APIActionLog>(TEN_SECONDS) {
private static final long serialVersionUID = 1L;
@Override
public long extractTimestamp(APIActionLog apiActionLog) {
return apiActionLog.getTs().getTime();
}
};
It was a silly mistake.
apiLogRecords.assignTimestampsAndWatermarks(API_LOG_RECORD_BOUNDED_OUT_OF_ORDERNESS_TIMESTAMP_EXTRACTOR);
call returns a new stream with assigned watermarks. This returned value should be used it in later operations.