Search code examples
google-bigquerychecksum

Does BigQuery Has an api to return a Checksum for a exported table


I'm looking out if there exist a way we can get the checksum of the file (or table) that can be export from google bigquery.

Example assuming that I exported(or downloaded) a file called daily-record-2018-07-10 from bigquery, is there a way bigquery can provide us with some checksum (like sha1, md5 etc) so that we can precompute those checksum on our end before we start processing those files

The reason to do so is to be sure that we aren't losing any information when are downloading(or exporting) the table. We do lose some information over the network then we can precompute the checksum and attempt again to export the table.


Solution

  • It sounds like the question may be about how to checksum a file, and I'm sure there are plenty of resources about that, but if you want to checksum a table that is stored in BigQuery, you can use a query of this form:

    SELECT BIT_XOR(FARM_FINGERPRINT(TO_JSON_STRING(t)))
    FROM MyTable AS t
    

    This converts each row to a string representation, hashes, and then XORs the results to generate a checksum across the table. You can also use this on external tables (e.g. files on Cloud Storage or Drive).