Search code examples
sql-serverdatabasegoogle-bigquery

Send SQL Server DataTable to Google Big Query


I want a solution to send a C# SQL Server DataTable to a Google BigQuery procedure to join it with another table and make an update.

I didn't find anything similar to what happens with SQL Server procedures

CREATE PROCEDURE MyProcedure
  @MyTable MyTableType READONLY

Reading the GBQ docs seems that it is not possible to use structs in parameterized queries using C# library. Maybe it is possible to serialize data to JSON and send it to GBQ?

Then I could use this JSON data to make the join using syntax like the one described here.

Do you know if it's possible to accomplish such a GOAL?


Solution

  • We finally solved the problem by serializing the DataTable using

    JsonConvert.SerializeObject(DataTableChunk, Formatting.None)
    

    and calling the GBQ Procedure using this code:

    public static void BQExecuteProcedure(BigQueryClient Client, BigQueryRoutine Routine, BigQueryParameter[] Parameters)
    {
    
        StringBuilder ProcedureName = new StringBuilder();
        ProcedureName.AppendJoin('.', new string[] { Routine.Reference.DatasetId, Routine.Reference.RoutineId });
        ProcedureName.Append("(");
        if (Parameters != null)
        {
            ProcedureName.AppendJoin(", ", Parameters.Select(p => "@" + p.Name));
        }
        ProcedureName.Append(")");
                
        string s = ProcedureName.ToString();
        string p = null;
    
        Client.CreateQueryJob(sql: $"CALL {ProcedureName}", parameters: Parameters).PollUntilCompleted();
        if (!(QueryJob.Status.State.ToUpper().Equals("DONE")))
        {
            throw new Exception(QueryJob.Status.ErrorResult.ToString());
        }
    }
    

    The GBQ procedure looks like the following

    CREATE OR REPLACE PROCEDURE MyDataset.PROCEDURE_NAME(JsonSerializedRecords STRING)
    
    BEGIN
    
    UPDATE MyDataset.MY_TABLE MY_TABLE_ALIAS
    SET
        MY_TABLE_ALIAS.EXAMPLE_FIELD = INPUT_TABLE_ALIAS.UPDATE_FIELD
    FROM 
    (
        SELECT 
            JSON_EXTRACT_SCALAR(json_string, "$.UPDATEFIELD") AS UPDATE_FIELD
        FROM UNNEST(JSON_EXTRACT_ARRAY(JsonSerializedRecords)) AS json_string
    ) INPUT_TABLE_ALIAS
    WHERE
        MY_TABLE_ALIAS.FIELD_1 = INPUT_TABLE_ALIAS.FIELD_1 AND
        MY_TABLE_ALIAS.FIELD_2 = INPUT_TABLE_ALIAS.FIELD_2;
        
    END
    

    We had also to break up the input DataTable because an input parameter to a GBQ Script cannot be greater than 1MB.


    Previous Solution

    We previously opted for a permanent table GBQ side to send data to and then making an update with the following update statement:

    UPDATE MyTable MyTableAlias
    SET MyTableAlias.Field1 = BrandNewTableAlias.Field1
    FROM BrandNewTable BrandNewTableAlias
    WHERE
        MyTableAlias.Field1 = BrandNewTableAlias.Field1;
    

    GBQ does not allow for UPDATE statements on JOIN operations like SQL Server, therefore we had to use the syntax above, as explained here on StackOverflow and here on the Google Docs.