Search code examples
postgresqlazure-data-factory

Copy from Azure Storage Explorer in Azure Data Factory: query table beforehand


I have on the Azure Storage Explorer a daily updated csv file from an API request. By using Azure Data Factory I want to copy this table to PostgreSQL. This can be done with a Copy Activity. However, I want do some modifications beforehand, because else the table in Postgres will become too large. Example of the table:

ID ReadDate MeterName Value
AAA 2024-11-13 22:00:00-05:00 Meter1 20
AAA 2024-11-13 16:00:00-05:00 Meter1 10
BBB 2024-11-13 22:00:00-05:00 Meter1 40
BBB 2024-11-13 22:00:00-05:00 Meter2 60

I want to:

  • Get only rows with Meter1 reads.
  • Get the latest values per ID.

So in the tabel above I only need row 1 and 3.

Since the tabel is daily updated via the API, I would like to overwrite in PostgreSQL the newest values. But we have to take into account that for some days a meter read can be missing for an ID.

Are these modifications somehow possible in Azure data factory before copying it to PostgreSQL?


Solution

  • You can use mapping data flow to achieve your requirement as follows:

    Create your CSV file as source dataset, add it source transformation. Add filter transformation to the source to select only rows where the MeterName is Meter1 using this condition:

    equals(MeterName, "Meter1")
    

    enter image description here

    Data preview of filter transformation:

    enter image description here

    Add aggregation transformation to get latest date rows, configure it as shown below:

    • Group by: Id, MeterName columns

    enter image description here

    • Aggregates: ReadDate - max(ReadDate), Value - first(Value)

    enter image description here

    You will get the data as required as shown below:

    enter image description here

    If you want to overwrite it into PostgreSQL table use Alter row transformation and select Upsert if , write your condition. Add postgresql dataset to sink transformation and select Allow upsert , select key column as shown below:

    enter image description here

    Debug the pipeline with dataflow activity with created data flow, the data will copy to the table successfully.