Search code examples
apache-flinkflink-streaming

Is it possible to perform partition addition every hour while flink job write data to hdfs file?


Currently, Flink can write data directly to hdfs file in ORC format for hive but need to insert partition every hour to the HIVE table. Is there any way to trigger a function every hour?


Solution

  • Sure, you can have a KeyedProcessFunction with a timer that fires every hour. Or you can write a custom sink that implements ProcessingTimeCallback (or maybe extend the sink you're using for HIVE to do this?). You could also implement a custom source that emits an event once an hour.

    ProcessFunction

    ProcessingTimeCallback