Search code examples
c#.net-coregoogle-bigquery

Composing non-flat schema in BigQuery with C#


I've been using the Google.Cloud.BigQuery.V2 nuget package and don't seem to be able to find any concrete examples of creating a non-trivial (by this, I mean non-flat) schema for table creation / row insertion.

Let's take this object as an example:

public class DataEntry
{
    public string Id { get; set; }

    public string Name { get; set; }

    public DateTime Created { get; set; }

    public Dictionary<string, string> Items { get; set; }
}

The Items dictionary may be null/empty or contain many entries.

From reading the docs, I'd expect .Items to be mapped as an ARRAY of STRUCT<string, string>, but I can't seem to figure out how to actually map this kind of object in code.

Can anyone point me to an example?

Thanks!


Solution

  • After a little more reading, it's become clear that whilst you can specify an ARRAY in your schema when adding a field of type BigQueryDbType.DateTime to a TableSchemaBuilder, it cannot be combined with a STRUCT to map a dictionary. The reason for this is, with hindsight, fairly obvious - a STRUCT is essentially like a System.Object in that it's properties (dictionary keys) need to be known ahead of time.

    In short, this will never work for my scenario since the dictionary key/value pairs are dynamic and won't be known beforehand.

    After digging into the supported query-time functions, I came across the JSON_EXTRACT function which provides a way out. By flattening the dictionary into json and storing it with a basetype of BigQueryDbType.String, I have a way to persist the dynamic data and query it on the other end.

    Armed with this information, the code to build a schema for the object above, would look something like this:

    // Build our object
    var builder = new TableSchemaBuilder();
    builder.Add("Id", BigQueryDbType.String, BigQueryFieldMode.Required);
    builder.Add("Name", BigQueryDbType.String, BigQueryFieldMode.Required);
    builder.Add("Inserted", BigQueryDbType.DateTime, BigQueryFieldMode.Required);
    // ...
    // ... Rest of fields
    // ...
    builder.Add("Items", BigQueryDbType.String, BigQueryFieldMode.Nullable);
    
    // Build our schema
    var schema = builder.Build();
    

    Assuming an Items value of {"KeyOne":"ItemOne", "KeyTwo":"ItemTwo"}, the query to tease out the first value in the dictionary would look something like this:

    SELECT JSON_EXTRACT(Items, "$.KeyOne") AS KeyOne FROM `mytable` LIMIT 1000
    

    Hopefully this will save someone else some time :)