Search code examples
pythongoogle-bigqueryairflowetlgoogle-cloud-composer

Cloud Dataflow SQL: Merge Statement not allowed


I have two tables, both on Bigquery:

  1. Employee(id, name, age) with all the employees (ID is PK).
  2. Online(id, name, age) where new employees are added (let's say daily) and/or old ones are modified (ID is PK).

Example tables

Every day, I want to run a job (with Cloud Composer) that read ONLINE table and insert new employees or/and update old ones. This upsert operation is called MERGE on Bigquery. I noticed that the Dataflow SQL doesn't allow to run job on Bigquery with the MERGE statement.

Dataflow SQL MERGE statement

Is there an Airflow Operator (about Bigquery) that I can use instead of DataflowStartSqlJobOperator ? If yes: is it right to i) delete rows from EMPLOYEE if ID is in ONLINE; ii) add ONLINE rows into EPLOYEE?


Solution

  • Solved. You can use the BigQueryInsertJobOperator.
    Here there's a guide on how to use it.