Search code examples
azureunix-timestampepochwatermarkincremental-load

Azure Delta Load won't recognize Epoch timestamp (ms) as Watermark Column Name


I am trying to perform a delta load (incremental load) with Azure Data Factory from SQL Server to Blob Storage. My tables have an updateStamp column that is Epoch Time in milliseconds, numeric(19,0) data type. When I look to select the Watermark column name in the configuration section of the Copy Data tool in ADF, it is not one of the options and it does not let me manually enter the column name. It looks like it only wants a datetime data type or key integer data type. I have tried with the Metadata-driven copy task and the Delta copy from Database template with no luck. Is there a work around or way of converting the max and using that (instead of adding another column to hundreds of millions of rows). Any help or guidance is appreciated.

I'm expecting to be able to use a data type that indicates a point in time as the watermark for an incremental load, even though that data type is not datetime.


Solution

  • I have tried to repro this in my environment using Delta copy from a database template in adf for the watermark column with epoch timestamp type. Below are the steps.

    • Input table and watermark table are taken as in below image. (Initial Watermark value for watermark table is given as 1657238400000, to copy all records in first run)

    enter image description here

    • Stored Procedure for updating new watermark value is written in SQL server as in below script.
    Create PROCEDURE update_watermark @LastModifyDate numeric(19,0)
    AS
    BEGIN
        UPDATE watermarktable
        SET [WatermarkValue] = @LastModifyDate
    END
    
    • In ADF, delta copy from database template is taken and linked service for source, sink and control table are given. Then, Use template is selected.

    enter image description here

    • Configuration in LookupLastWaterMark, LookupCurrentWaterMark, DeltaCopyfromDB activities are not changed

    • In UpdateWaterMark activity, Stored procedure name is selected, and parameter is imported. type of the parameter LastModifyDate is given as Int64. enter image description here

    • Debug is clicked for the pipeline run and pipeline parameters for Source, sink and control table are given.

    enter image description here

    • Once file is copied, watermark value is updated with the latest value. enter image description here

    Sink File:

    enter image description here

    • New line item is added to the source (4th record in the below image is added newly).

    enter image description here

    • Pipeline is rerun to check if delta lines are copied. enter image description here

    Delta records are copied to sink when epoch timestamp is given as watermark column.

    Reference: MS doc on Delta copy from a database template.