From the import documentation of BigQuery,
Note: Null values are not allowed
So I assume null
is not allowed in a json-formatted data for BigQuery import. However, null
value is actually very common in regular ETL task (due to missing data). What should be a good solution to import such json source files? Note my data contains nested structures so I do not prefer a conversion to CSV
and use ,,
to represent a null
value.
One way I think I can do is to replace all null
values with default values of different data types respectively, e.g.,
null
-> empty stringnull
-> -1null
-> -1.0But I don't like it. I am looking for better options.
BTW, I tried to do bq load
with a json file containing null
values. I get the below error:
Failure details:
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n
...
I think this is the indication of null
usage, is it correct?
EDIT: If I remove all the null
fields, it seems to work. I guess this is the way to handle the null
data. You cannot have null
for a data field, but you can just not include it. So I need to have a filtering code to remove all the null
field in my raw json.
You can import NULL values using JSON format source files - omit the key:value pair for values that are NULL.
Example - Let's say you have a schema like this:
{
"name": "kind",
"type": "string"
},
{
"name": "fullName",
"type": "string",
},
{
"name": "age",
"type": "integer",
"mode": "nullable"
}
A record with no NULL values might look like this:
{"kind": "person",
"fullName": "Some Person",
"age": 22
}
However, when "age" is NULL, try this (note, no "age" key):
{"kind": "person",
"fullName": "Some Person",
}
Please let us know if you have issues with this. I'll make a note to improve the documentation around using NULL values with JSON import formats.