Search code examples
pythonazure-databrickswatermark

how to create the watermark table in databricks


I would like to have a watermark table created in databricks with one column (version) and its value 1. This will be starting point. eveytime the python script will finish running I want to update the value by 1.

Goal is to use this value later before the python code runs.


Solution

  • Initially, create a watermark table in Databricks with column named "version" and set its value to 1.

    %sql
    CREATE TABLE watermarktable (
      version INT
    );
    INSERT INTO watermarktable VALUES (1);
    

    To update the value of the "version" column by 1 after each run of Python script, you can add the following code at the end of python script.

    spark.sql("UPDATE watermarktable SET version = version + 1;")
    

    This will update the value of the "version" column by 1 every time you run the python script.