Search code examples
apache-flinkflink-streaming

Using external values as conditions in Apache Flink


I’m building an application which needs to aggregate measures coming from a series of sensors deployed in different areas. The measures are ingested using Kafka. I'm new to Flink but I already figured out how to aggregate events with a window and send them in a sink. However I also need to compare the aggregate values per area with thresholds (also per area) coming from an external db (in my case Postgres). These thresholds can also be updated over time or added whenever a new area is created. Any suggestion? Thanks euks


Solution

  • To stream in the thresholds from Postgres, you could set up a Table source connected to Postgres via debezium. https://www.youtube.com/watch?v=wRIQqgI1gLA is a good resource for learning about that. Or you could do a lookup join, with caching (if that's desirable).

    Once you've got the thresholds streaming in, then you can, for example, model the changelog stream of thresholds as a versioned table, and use a temporal join to combine it with the primary stream.

    If for some reason that doesn't get you what you want, you could convert the dynamic tables to DataStreams, and use a KeyedCoProcessFunction to implement a lower-level solution (more work but more flexible). Or if there is no shared key between the two data sources, you can instead broadcast the thresholds.