Search code examples
jsonnullgoogle-bigquerydata-import

Import json data with null values


From the import documentation of BigQuery,

Note: Null values are not allowed

So I assume null is not allowed in a json-formatted data for BigQuery import. However, null value is actually very common in regular ETL task (due to missing data). What should be a good solution to import such json source files? Note my data contains nested structures so I do not prefer a conversion to CSV and use ,, to represent a null value.

One way I think I can do is to replace all null values with default values of different data types respectively, e.g.,

  • string: null -> empty string
  • integer: null -> -1
  • float: null -> -1.0
  • ...

But I don't like it. I am looking for better options.

BTW, I tried to do bq load with a json file containing null values. I get the below error:

Failure details:
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n
...

I think this is the indication of null usage, is it correct?

EDIT: If I remove all the null fields, it seems to work. I guess this is the way to handle the null data. You cannot have null for a data field, but you can just not include it. So I need to have a filtering code to remove all the null field in my raw json.


Solution

  • You can import NULL values using JSON format source files - omit the key:value pair for values that are NULL.

    Example - Let's say you have a schema like this:

    {
    "name": "kind",
    "type": "string"
    },
    {
    "name": "fullName",
    "type": "string",
    },
    {
    "name": "age",
    "type": "integer",
    "mode": "nullable"
    }
    

    A record with no NULL values might look like this:

    {"kind": "person",
     "fullName": "Some Person",
     "age": 22
    }
    

    However, when "age" is NULL, try this (note, no "age" key):

    {"kind": "person",
     "fullName": "Some Person",
    }
    

    Please let us know if you have issues with this. I'll make a note to improve the documentation around using NULL values with JSON import formats.