state apache-flink flink-streaming stream-processing flink-statefun

Stateful Functions in Apache Flink

I examine new Stateful Functions 2.0 API of Apache Flink. I read following documentation link https://ci.apache.org/projects/flink/flink-statefun-docs-stable/. Also I ran examples in Git repo. (https://github.com/ververica/stateful-functions/tree/master/stateful-functions-examples) I have few questions about implemantation.

https://flink.apache.org/stateful-functions.html --> There is an example which is Transaction Scoring for Fraud Detection at the end of the page.

First question is about state TTL. How can I give to state to TTL? Example says: After 30 days, the “Fraud Count” function will receive an expiration message (from itself) and clear its state. Should I do this manual or is there another feature? How can I do this manual?

Second Question about keyedstream. Example says: multiple instances of “Fraud Count” will exist — for example, one per customer account. Should I put values to PersistedTable<K,V>? For example <customerid,count>. Can I clear state to specific key?

Last Question is about windowing and watermark. How can I implement theese feature to Stateful Functions 2.0?

Solution

First question is about state TTL. How can I give to state to TTL? Example says: After 30 days, the “Fraud Count” function will receive an expiration message (from itself) and clear its state. Should I do this manually or is there another feature? How can I do this manual?

You can do this manually using delayed message. In effect, you can create a call back trigger by sending yourself a message on a delay. This message is durable and will not be lost in case of failure. If you look at the fraud count function, in the model serving example, you will see that it does exactly this. When a value is received a ttl message is sent with a 30 day delay. When that message is received the count is decremented.

Second Question about keyedstream. Example says: multiple instances of “Fraud Count” will exist — for example, one per customer account. Should I put values to PersistedTable? For example . Can I clear state to specific key?

All function instances are "keyed", in that user code is always invoked within the scope of a key, and all Persisted fields are scoped to that key. The key is the "id" component of an address. In your example, you could have a function "CustomerFunction" that tracks information on each customer of your buisness. When you want to interact with that customer, you will message it specifying that customers uid as the "id" of the address.

new Address(new FunctionType("ns", "customer"), "customer-id-1");

If you are tracking a count per customer, you only need a PersistedValue, since it is already scoped to that customer id. Going back to the fraud count example, that function is scoped on "account id", it is tracking the number of fraudulent transactions per bank account.

Last Question is about windowing and watermark. How can I implement theese feature to Stateful Functions 2.0?

These features are not directly supported in statefun 2.0. The reason for windows is that they are mostly applicable to data processesing, not application development. For those use cases, you are likely better served using Flink's DataStream and Table API's though it is possible to implement them yourself in user code.

Event time is tricky. Event time is uses "watermarks" under the hood to track the progression of time within the system. They depend on data being well ordered in relation to their watermarks. Meaning if event x is ingested with a timestamp of 1:59 in front of a watermark for 2:00, it must always stay in front of that watermark. Otherwise, this on-time record will eroneously be labeled as late.

Stateful Functions is based on iteration and arbitrary message passing. Because records can go in any direction through the dataflow, event time is not well defined.