We are considering using Beam/Dataflow for stateful processing:
Example: get max price article bought for each 1 mio clients since registered on a portal
Also, we would also like to access those calculated aggregates while not interfering with the real-time job.
Design question : can it be covered by the current state back-end - Windmill/Persistent Disks [1] - or would the use of a database (like BigTable) be a better fit ?
Thanks !
It is actually possible to define Big Table connectors in Dataflow to perform reading and writing operations. Moreover, there is the project.jobs.get method of the Dialogflow API that returns an instance of a job which is a json response containing also the “currentState” field. Therefore I think you could build a sort of automation script to get this field value and then store it in Big Table database using the Big Table connectors, however it is a quite complex solution and I am not sure if it could be convenient.