Search code examples
javaapache-flinkflink-streaming

Solving for Scheduled Processing with Apache Flink


We have about 500 million drivers in 12 timezones. We send different communications such as their earnings report, new promotions, policy change updates etc., periodically. We want these communications to be delivered to them at a time that works best for them. For example - 9AM local time. We'd like to generate these communications early and publish them to Flink and schedule them for delivery based at appropriate times.

Messages will be in the following format - {message_id, message, scheduled_time_in_utc}

For simplicity, we can assume that scheduled_time_in_utc is always 1 hour granularity. How do we go about achieving this functionality with Flink?

Any pointers are appreciated! Thanks!


Solution

  • I'm assuming you can push these communications in Kafka, which Flink would use as its source, and that you've got some async I/O method of sending out the communications (e.g. an SMS service). If so, then you could have a workflow that:

    1. Reads the messages from Kakfa
    2. Key by message_id
    3. .process(new ReleaseTimedMessages())
    4. Use Flink's AsyncIO support to send the message

    Where the ReleaseTimedMessages KeyedProcessFunction would save the msg in state, and set up a process time timer to fire at the target delivery time. When it fires, emit the message and clear the timer.