Search code examples
azure-data-factoryupsertsqlgeography

Azure Data Factory Copy Pipeline with Geography Data Type


I am trying to get a geography data type from a production DB to another DB on a nightly occurrence. I really wanted to leverage upsert as the write activity, but it seems that geography is not supported with this method. I was reading a similar post about bringing the data through ADF as a well known text data type and then changing it, but I keep getting confused on what to do with the data once it is brought over as a well known data type. I would appreciate any advice, thank you.

Tried to utilize ADF pipelines and data flows. Tried to convert the data type once it was in the destination, but then I was not able to run the pipeline again.


Solution

    • I tried to upsert the data with geography datatype from one Azure SQL database to another using copy activity and got error message.

    enter image description here

    • Then, I did the upsert using dataflow activity. Below are the steps.

    • A source table is taken in dataflow as in below image.

    CREATE  TABLE SpatialTable
    ( id int ,
    GeogCol1 geography,
    GeogCol2 AS GeogCol1.STAsText() );
    
    INSERT  INTO SpatialTable (id,GeogCol1)
    VALUES (1,geography::STGeomFromText('LINESTRING(-122.360 46.656, -122.343 46.656 )', 4326));
    
    INSERT  INTO SpatialTable (id,GeogCol1)
    VALUES (2,geography::STGeomFromText('POLYGON((-122.357 47.653 , -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653))', 4326));
    

    enter image description here

    • Then Alter Row transformation is taken and in Alter Row Conditions, Upsert if isNull(id)==false()is given. (Based on the column id, sink table upserted) enter image description here

    • Then, in Sink dataset for target table is given. In sink settings, Update method is selected as Allow Upsert and required Key column is given. (Here column id is selected) enter image description here

    • When pipeline is run for the first time, data is inserted into target table.

    • When pipeline is run for the second time by updating the existing data and inserting new records to source, data is upserted correctly.

    Source Data is changed for id=1 and new row is inserted with id=3 enter image description here

    Sink data is reflecting the changes done in source. enter image description here