I am creating a Cloud Data Fusion pipeline where I need to pull incremental data from a Database. My query to pull data is like this :
SELECT * FROM TABLE WHERE updated_date >${last_pipeline_run_time}
What is the best way to get last_pipeline_run_time passed as parameter dynamically to the Data Fusion pipeline. If you could suggest any other workaround for this, that also would be helpful.I am not using any other scheduler (like Airflow) for the time being.
don't see such a parameter in Data Fusion pipeline though, you may consider storing similar metadata in a separate table, then fetch it by using 'Database Argument Setter' plugin.
For example, the pipeline can be:
'Database Argument Setter' -> 'Database source' -> (other plugins for your pipeline) -> 'Database Action'
'Database Argument Setter' is to query the metadata table for the timestamp and store it as runtime argument for this pipeline.
The 'Database Action' at the end of the pipeline is to update the metadata table with the timestamp of the latest loaded data by this pipeline, so that next time the pipeline can load the data from this timestamp.