I am looking for some help in regards to how I go about comparing 2 tables in BigQuery for UAT Testing.
Currently I have 6 columns in both tables and they both have the same column names in the same positions.
Create Table Scripts:
CREATE TABLE `project.mydataset.table_1` (
`ADDRESS_ID` STRING,
`ORDER_NO` STRING,
`START_DATE` STRING,
`END_DATE` STRING,
`JOB_DETAILS` STRING,
`LOAD_DATE` STRING
);
CREATE TABLE `project.mydataset.table_2` (
`ADDRESS_ID` STRING,
`ORDER_NO` STRING,
`START_DATE` STRING,
`END_DATE` STRING,
`JOB_DETAILS` STRING,
`LOAD_DATE` STRING
);
The following is an example of what the table looks like:
ADDRESS_ID | ORDER_NO | START_DATE | END_DATE | JOB_DETAILS | LOAD_DATE
000045 | AE12345 | 2019-04-03 00:00:00 | 2019-04-21 17:14:11 | With Engineer | 2019-04-02 00:00:00
Now what I would like to do, is to go through various checks that will give me stats for the differences between the 2 tables.
One of the first queries I have written was to check the volume I have in both tables for the 01/04/22 for example and this gave me the following volumes:
TABLE_1 = 4317
TABLE_2 = 4431
The next one was to check the duplicates in the 'ADDRESS_ID' column and this gave me the following:
TABLE_1 = 67
TABLE_2 = 61
But rather than writing individual queries, is there something that I can write that will give me a load of stats or differences i.e. that will check the 2 tables row by row with a join on the address key or something and then compare and then tell me the difference across the whole table?
Apologies I don't have anymore sample data, as the data is from my workplace.
Take a look at Google's Data Validation Tool. It is pretty decent at running tests between two tables of varying sources.