Search code examples
apache-kafkamenudecision-treeflink-streaming

Kafka: how could several options be picked up modulated by users via a menu or decision tree?


I am grouping input data based on key, then doing 1 minute window with 30 seconds hop and in aggregator.

Data is being sent and consumed by an application, and the need for this application could evolve in the future, therefore, I see a need for future flexibility and quick change.

The current logic is described below:

@StreamListener("input")
    public void process(KStream<String, Data> DataKStream) {

        JsonSerde<DataAggregator> DataJsonSerde =
                new JsonSerde<>(DataAggregator.class);

        DataKStream
                .groupByKey()
                .windowedBy(TimeWindows.of(60000).advanceBy(30000))
                .aggregate(
                        DataAggregator::new,
                        (key, Data, aggregator) -> aggregator.add(Data),
                        Materialized.with(Serdes.String(), DataJsonSerde)
                );
    }
DataAggregator.java

public class DataAggregator {

    private List<String> dataList = new ArrayList<>();

    public DataAggregator add(Data data) {
        dataList.add(data.getId());
        System.out.println(dataList);
        return this;
    }

    public List<String> getDataList() {
        return dataList;
    }
}

However, given evolving requirements I would like to give users the possibility to change the logic via a menu.

For example, users could change the window at wish or change the way data is segregated.

I was potentially thinking of writing several java classes which could be turned on and off when users pick specific options.

But I am wondering if something better and more dynamic could be done.


Solution

  • With Flink, some things can't be changed while a job is running -- notable, the topology of the job graph, and the parallelism of the operators.

    On the other hand, a control stream can be broadcast throughout the cluster to effect dynamic changes to the business logic. In simple cases this has been used to modify filter parameters; in more complex cases it has been used, for example, to trigger dynamic loading of code or machine learning models (e.g., by broadcasting PMML) used in transformations.

    Sample use cases: RBEA: Scalable Real-Time Analytics at King, StreamING models, how ING adds models ... .

    What's less obvious is how dynamically reconfigure aggregations. The open source Fraud Detection Demo (part 1, part 2, github) illustrates how to accomplish that.

    For another example, see Cogynt: Flink without code.