I've been using the Google.Cloud.BigQuery.V2
nuget package and don't seem to be able to find any concrete examples of creating a non-trivial (by this, I mean non-flat) schema for table creation / row insertion.
Let's take this object as an example:
public class DataEntry
{
public string Id { get; set; }
public string Name { get; set; }
public DateTime Created { get; set; }
public Dictionary<string, string> Items { get; set; }
}
The Items
dictionary may be null/empty or contain many entries.
From reading the docs, I'd expect .Items
to be mapped as an ARRAY
of STRUCT<string, string>
, but I can't seem to figure out how to actually map this kind of object in code.
Can anyone point me to an example?
Thanks!
After a little more reading, it's become clear that whilst you can specify an ARRAY
in your schema when adding a field of type BigQueryDbType.DateTime
to a TableSchemaBuilder
, it cannot be combined with a STRUCT
to map a dictionary. The reason for this is, with hindsight, fairly obvious - a STRUCT
is essentially like a System.Object
in that it's properties (dictionary keys) need to be known ahead of time.
In short, this will never work for my scenario since the dictionary key/value pairs are dynamic and won't be known beforehand.
After digging into the supported query-time functions, I came across the JSON_EXTRACT
function which provides a way out. By flattening the dictionary into json and storing it with a basetype of BigQueryDbType.String
, I have a way to persist the dynamic data and query it on the other end.
Armed with this information, the code to build a schema for the object above, would look something like this:
// Build our object
var builder = new TableSchemaBuilder();
builder.Add("Id", BigQueryDbType.String, BigQueryFieldMode.Required);
builder.Add("Name", BigQueryDbType.String, BigQueryFieldMode.Required);
builder.Add("Inserted", BigQueryDbType.DateTime, BigQueryFieldMode.Required);
// ...
// ... Rest of fields
// ...
builder.Add("Items", BigQueryDbType.String, BigQueryFieldMode.Nullable);
// Build our schema
var schema = builder.Build();
Assuming an Items
value of {"KeyOne":"ItemOne", "KeyTwo":"ItemTwo"}
, the query to tease out the first value in the dictionary would look something like this:
SELECT JSON_EXTRACT(Items, "$.KeyOne") AS KeyOne FROM `mytable` LIMIT 1000
Hopefully this will save someone else some time :)