I have on the Azure Storage Explorer a daily updated csv file from an API request. By using Azure Data Factory I want to copy this table to PostgreSQL. This can be done with a Copy Activity. However, I want do some modifications beforehand, because else the table in Postgres will become too large. Example of the table:
ID | ReadDate | MeterName | Value |
---|---|---|---|
AAA | 2024-11-13 22:00:00-05:00 | Meter1 | 20 |
AAA | 2024-11-13 16:00:00-05:00 | Meter1 | 10 |
BBB | 2024-11-13 22:00:00-05:00 | Meter1 | 40 |
BBB | 2024-11-13 22:00:00-05:00 | Meter2 | 60 |
I want to:
So in the tabel above I only need row 1 and 3.
Since the tabel is daily updated via the API, I would like to overwrite in PostgreSQL the newest values. But we have to take into account that for some days a meter read can be missing for an ID.
Are these modifications somehow possible in Azure data factory before copying it to PostgreSQL?
You can use mapping data flow to achieve your requirement as follows:
Create your CSV file as source dataset, add it source transformation. Add filter transformation to the source to select only rows where the MeterName
is Meter1
using this condition:
equals(MeterName, "Meter1")
Data preview of filter transformation:
Add aggregation transformation to get latest date rows, configure it as shown below:
Id
, MeterName
columnsReadDate - max(ReadDate)
, Value - first(Value)
You will get the data as required as shown below:
If you want to overwrite it into PostgreSQL table use Alter row transformation and select Upsert if , write your condition. Add postgresql dataset to sink transformation and select Allow upsert , select key column as shown below:
Debug the pipeline with dataflow activity with created data flow, the data will copy to the table successfully.