Search code examples
google-cloud-platformgoogle-bigquerydtsdata-integrity

Does Data Transfer Service(DTS) perform any Data Integrity checks(for eg MD5 checksum) while copying datasets in BigQuery?


While working on the BigQuery's data transfers(DTS) for copying the datasets within same GCP project, we noticed that it doesn't provide much details in the cloud logging regarding the Data integrity post migration/copying of the BigQuery datasets.

In case of Storage transfer service(STS), it does have an option to enable logging where we can see MD5 checksum under source object field after copying each bucket.

Do we have such option in DTS? If not, how Bigquery DTS takes care of Data Integrity while migrating datasets?

Is there a way or recommendation for implementing MD5 checksum while using DTS?


Solution

  • MD5 checksum is not available in BigQuery Data Transfer Service (DTS) when copying datasets. What's being done is the verification of row count before and after the copying of data. Also, upon checking, a feature request has been filed regarding this case since other customers were asking about this feature.

    However, you can still see the logs in BigQuery DTS UI whether the jobs have failed or succeeded as well as in Cloud Logging. Another thing, copy jobs are based on individual tables, therefore, checksum would just be the summary of individual table.