Search code examples
pythongoogle-bigquerypython-requestsgoogle-api-python-client

how to solve repeated field error in big-query inserted through python client?


I am using an google api which returns data in json format as below:

{
            "storageLocations": [
                "us"],      
            
"autoCreated": true,
            
"downloadBytes": "77557"

}

One of the fields returned is storageLocations which looks like an array type , so i have defined it as repeated field in bigquery.

I need to insert this data into big-query using python-big-query client , so for that field I have created the following structure in big-query. enter image description here

For creating each row, I add a empty field row={} and I loop through the json response and assign it like :

row["storageLocations.locations"]=response["storageLocations"]
  row["autoCreated"]= response["autoCreated"]

When data gets inserted

For storageLocations I am getting this error

 u'insertErrors': [{u'index': 0, u'errors': [{u'debugInfo': u'', u'reason': u'invalid', u'message': u'no
     such field.', u'location': u'storageLocations.locations'}]}

I have also tried row["storageLocations"]["locations"] and that is also not working.

As i am new to python and big-query am not sure of the error.

Thanks for your suggestions.


Solution

  • Typically, when you use a repeated record you tend to have more than one leaf field, or you plan to add more in the future.

    The JSON response you're working with looks like you may want to simply use an array of strings.

    For example:

    schema=[
        bigquery.SchemaField("storageLocations", "STRING", mode="REPEATED"),
        bigquery.SchemaField("autoCreated", "BOOL"),
        bigquery.SchemaField("downloadBytes", "INT64"),
    ]
    

    With the simpler schema, you should simply be able to send your list/tuple of string values when inserting rows e.g. row["storageLocations"] = ['a','b','c']

    Without this, you need to construct a much more complex row, as you have discovered.