Search code examples
sqlgoogle-bigquerysql-updatesql-insertexecute-immediate

How update a table in Big Query and store the original value replaced and her difference with the new value


In a previous post, see here: How update a table in Big Query where the name of fields to update are values in another table

I had two table and the following problem:

Table 1:

+-------+------------+---------+
| ID    | field_name | value   |
+-------+------------+---------+
| 1     | usd        |  10.08  |
| 1     | gross_amt  |  52.0   |
| 1     | jpy        |  30.05  |
| 2     | usd        |  50.0   |
| 2     | eur        |  50.0   |
| 3     | real_amt   |  210.43 |
| 3     | total      |  320    |
| 4     | jpy        |  23.45  |
| 4     | name       |  john   |
| 4     | city       |  utah   |
+-------+------------+---------+

Table 2:

+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| ID  |  name | last_name |   date1  | counrty | city |  usd  |  eur  |  jpy  | gross_amt | real_amt | total | ... | field200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| 1   |  jane | doe       | 19900108 |   usa   | LA   | 9.08  | 0.00  | 29.05 | 50.0      |  52.0    | 900.0 | ... | value200 |
| 2   |  lane | smith     | 19900108 |   usa   | LA   | 40.8  | 40.0  | 0.00  | 100.0     |  70.0    | 290.0 | ... | value200 |
| 3   |  mike | hoffa     | 19900108 |   usa   | SF   | 5.05  | 0.00  | 0.00  | 10.0      |  25.0    | 100.0 | ... | value200 |
| 4   |  paul | doe       | 19900108 |   usa   | NY   | 1.00  | 0.00  | 29.05 | 45.0      |  55.0    | 110.0 | ... | value200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+

Need to update the values of the fields in the table 2, which are in table 1 in column field_name, with the values of table 1 column value, both IDs are the same in both tables, beside that, the datatype of column value in table 1 are string, but the data type of the columns to update in the table 2 are diferent, especially the numbers(numeric, int64, float64)

The tables above are an example, table 2 of the real problem has 200 fields and in table 1 for an ID there can be up to 40 value modifications for thousands of records to be modified daily

BUT NOW I HAVE A NEW PROBLEM

In the new table1 (Table 3):

Table 3:

+-------+------------+---------+-----------------------+------------------+
| ID    | field_name | value   | value_replaced_table2 |      diff        |
+-------+------------+---------+-----------------------+------------------+
| 1     | usd        |  10.08  |   9.08                | abs(10.08-9.08)  |
| 1     | gross_amt  |  52.0   |  50.0                 | abs(52.0-50.0)   |
| 1     | jpy        |  30.05  |  29.05                | abs(30.05-29.05) |
| 2     | usd        |  50.0   |  40.08                | abs(50.0-40.0)   |
| 2     | eur        |  50.0   |  40.0                 |       ......     |
| 3     | real_amt   |  210.43 |  25.0                 |       ......     |
| 3     | total      |  320    | 100.0                 |       ......     |
| 4     | jpy        |  23.45  |  29.05                | abs(23.45-29.05) |
| 4     | name       |  john   |  paul                 |       john       |
| 4     | city       |  utah   |  NY                   |       utah       |
+-------+------------+---------+-----------------------+------------------+

I need insert into new table 1 (table 3) in the column value_replaced_table2 the value replaced in the table 2, thus storing the replaced value in table 2 above, and calculate the differences between both values, (the new value to update and the old value replaced in table 2, note that the data types in the new table 1 (table 3) are string, and those in table 2 are (numeric, int64, float64)

From now, thanks for your answers!


Solution

  • Using the previously created pivot1 table, you can use it before executing the final MERGE with table2 to get the old_values that will change.

    Then, you will need to unpivot the results to get table3. Example using your sample data :

    -- Same pivot1 table as before
    EXECUTE IMMEDIATE '''
    CREATE TEMP TABLE pivot1 AS
    SELECT id, ''' || (
      SELECT STRING_AGG(DISTINCT "MAX(IF(field_name = '" || field_name || "', CAST(value AS " || data_type || "), NULL)) AS " || field_name)
      FROM `project.dataset.table1`
      JOIN (
        SELECT column_name, data_type
        FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
        WHERE table_name = 'replica2' 
      ) ON field_name = column_name
    ) || '''  
    FROM `project.dataset.table1`
    GROUP BY id
    ''';
    
    -- Table3
    EXECUTE IMMEDIATE '''
    CREATE OR REPLACE TABLE project.dataset.table3 AS
    SELECT a.id, values.column_name as field_name, values.new_value as value, values.old_value as value_replaced_table2,
     CASE
        WHEN values.data_type = "STRING" THEN values.new_value
        WHEN values.data_type = "INT64" THEN CAST(ABS(CAST(values.new_value AS INT64) - CAST(values.old_value AS INT64)) AS STRING)
        ELSE CAST(ABS(CAST(values.new_value AS FLOAT64) - CAST(values.old_value AS FLOAT64)) AS STRING)
     END as diff
    FROM (
    SELECT t1.id, [''' || (
      SELECT STRING_AGG(DISTINCT "STRUCT('" || column_name || "' as column_name, CAST(t1." || column_name || " AS STRING) as new_value, CAST(t2." || column_name || " AS STRING) as old_value, '" || data_type || "' as data_type)") 
      FROM `project.dataset.table1`
      JOIN (
        SELECT column_name, data_type
        FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
        WHERE table_name = 'replica2' 
      ) ON field_name = column_name
    ) || '''] AS values
    FROM `project.dataset.table2` AS t2
    JOIN pivot1 AS t1
    ON t2.id = t1.id ) a
    CROSS JOIN UNNEST(a.values) as values
    WHERE values.new_value IS NOT NULL
    ''';
    
    SELECT * FROM `project.dataset.table3` ORDER BY id;
    

    Notice that the FLOAT64 difference will get an approximate value when casted to STRING, so if you want to round up the difference, you can use a cast to NUMERIC instead of FLOAT64, e.g.:

    ...
    ELSE CAST(ABS(CAST(values.new_value AS NUMERIC) - CAST(values.old_value AS NUMERIC)) AS STRING)
    ...
    -- Instead of 9.20000000000000028, it will appear as 9.2
    

    Results