Search code examples
azure-databricksazure-synapsedelta-lake

How to update two columns in delta lake? Doing INSERT and UPDATE to delta lake like a database for operational data


Let's I have a file/table Values in Azure Databricks Delta Lake that has the following contents

ID Value Version
1 0 0
2 0 0
3 0 0

Now it is read to a user interface by users User1 and User2.

At moment T = 0 User1 updates the file as follows in SQL syntax UPDATE Values SET Value = '1' WHERE ID = '1' AND Version = 0;

Later at moment T = 1 User2 update the file as follows in SQL syntax UPDATE Values SET Value = '1' WHERE ID = '0' AND Version = 0;

In this case, I assume the update by User2 succeeds, since it matches the predicate.

Could one automatically update the version number within the same transaction with Delta Lake? So something like

UPDATE Values SET Value = '1', Version = Version + 1 WHERE ID = '1' AND Version = 0; or in this case have Version autoincrement?

I wonder if this is possible or am I missing something simple here (maybe I am, it's late).

The aim here is for the User2 detect if the row in that file has been modified by someone else and then issue a notification of it. So, in essence to use Delta Lake as a transactional database (whether it's right or wrong).

I put both "delta lake" and Synapse here since one can map Delta Lake tables in Synapse. But I'm most interested Azure Data Bricks Delta Lake.


Solution

  • Delta Lake is currently designed to provide transactions for OLAP/data warehousing designs. Saying this, there is currently no concept of RDBMS triggers that could update the version number based on the existing data.

    Saying this, note that Delta Lake internally maintains table versions (VERSION) within the context of Delta Lake time travel, here's an example syntax:

    SELECT * FROM VALUES VERSION AS OF 0;
    

    Saying this, for your scenario, you can probably build a query utilizing Delta Lake Change Data Feed where you can query the change data feed to look for any changes to your table and/or you can query time travel history to determine if there are any changes.