Search code examples
postgresqlscalaapache-sparkspark-structured-streaming

sSpark structured streaming PostgreSQL updatestatebykey


How to update state of OUTPUT TABLE by Spark structured streaming computation triggered by changes in INPUT PostgreSQL table?

As a real life scenario USERS table has been updated by user_id = 0002, how to trigger Spark computation for that user only and write / update results to another table?


Solution

  • Although there is no solution out of the box you can implement it the following way.

    You can use Linkedin's Databus or other similar tools which mines the databse logs and produce respective events to kafka. The tool tracks the changes in database bin logs. You can write a kafka connector to transform and filter data. You can then consume events from kafka and process them to any sink format you want.