Search code examples
google-bigquerygoogle-cloud-datalab

Datalab does not populate bigQuery tables


Hi I have a problem while using ipython notebooks on datalab.

I want to write the result of a table into a bigQuery table but it does not work and anyone says to use the insert_data(dataframe) function but it does not populate my table. To simplify the problem I try to read a table and write it to a just created table (with the same schema) but it does not work. Can anyone tell me where I am wrong?

import gcp
import gcp.bigquery as bq

#read the data
df = bq.Query('SELECT 1 as a, 2 as b FROM [publicdata:samples.wikipedia] LIMIT 3').to_dataframe()

#creation of a dataset and extraction of the schema
dataset = bq.DataSet('prova1')
dataset.create(friendly_name='aaa', description='bbb')
schema = bq.Schema.from_dataframe(df)

#creation of the table
temptable = bq.Table('prova1.prova2').create(schema=schema, overwrite=True)

#I try to put the same data into the temptable just created
temptable.insert_data(df)

Solution

  • Calling insert_data will do a HTTP POST and return once that is done. However, it can take some time for the data to show up in the BQ table (up to several minutes). Try wait a while before using the table. We may be able to address this in a future update, see this

    The hacky way to block until ready right now should be something like:

    import time
    while True:
      info = temptable._api.tables_get(temptable._name_parts)
      if 'streamingBuffer' not in info:
        break
      if info['streamingBuffer']['estimatedRows'] > 0:
        break
      time.sleep(5)