hadoop hive apache-flink flink-streaming

How to decrease latency for HIVE data ingestion in Apache Flink?

I am directly writing data to HDFS files in orc format using Apache Flink for HIVE tale to read. Apache Flink will convert in progress file to finish state after checkpoint time. Only finished files are visible to the HIVE table. So the latency will be checkpoint time(in my case it is 10 mins). If we decrease checkpoint time to decrease latency, then Flink creates too many HDFS files. So how to decrease latency while not creating too many files?

Solution

The only thing you can do is to reduce the parallelism.