Search code examples
google-cloud-dataprep

Cloud Dataprep BigQuery Upsert


Is there a way to update rows in Google BigQuery when publishing from Cloud Dataprep? I can't find anything in the documentation.

I have a dataset I'm preprocessing with Dataprep that contains new rows and updated rows on every (daily) run. I would like to incrementally write it to BigQuery adding new rows and updating existing changed rows.


Solution

  • So Trifacta (CDP) can either append, replace or create new file each time the Job is run. You're refering to an Upsert\Merge action. So this can be done by 2 steps.

    1. for new rows, you can use the "append" run option
    2. for updating existing rows - 2 options.

    Either - A. do a Union transformation with the output file (already in bigquery), then deduplicate\aggregate and choose how you'd like to merge between the duped rows. this then can be done each time a new file is made\created\updated. B. create a new file, then through parameterization import all files within the folder - unioning them and do the grouping mentioned in A.

    Obviously - option A seems more simplistic.

    Does that make sense?