Search code examples
databricksdelta-live-tables

delta live tables dump final gold table to cassandra


we have a delta live tables which read from kafka topic, clean/filter/process/aggregate the message, and dump it to bronze/silver/gold table, in order to build a REST service to retrieve the aggregated result, we need to dump the data from gold table to cassandra table. I tried to update the script for gold table, after the aggregated result to dump to gold, i added one more step to further dump the updated result to cassandra table but it didn't work:

@dlt.table
def test_live_gold():
  return (
    dlt.read("test_kafka_silver").groupBy("user_id", "event_type").count()

#     df = spark.read.format("delta")
#       .table("customer.test_live_gold")
#       .withColumnRenamed("user_id", "account_id")
#       .withColumnRenamed("event_type", "event_name")
#       .withColumn("last_updated_dt", current_timestamp())
#     df.show(5, False)
#     write_to_cassandra_table('customer', 'test_keyspace', df)
  )

how can I copy result from delta table to cassandra in one workflow as the delta live tables?


Solution

  • By default, Delta Live Tables only store data as Delta. If you need to write data to somewhere else, then you need to add another step in your job (Databricks workflow) that will use notebook to read data from gold table produced by the test_live_gold and write into Cassandra. Something like this:

    enter image description here