How do I store a DataFrame into a BigTable in Google DataLab?

I have a DataFrame df. I create a BigQuery table.

# Create the schema, using the convenience of basing it on example DataFrame
schema = bq.Schema.from_dataframe(df)

# Create the dataset
bq.DataSet('ids').create()

# Create the table
suri_table = bq.Table('ids.suri').create(schema = schema, overwrite = True)


project = gcp.Context.default().project_id

There is a Pandas function [to_gbq()][1] which I want to use to store the DataFrame.

df.to_gbq(df, 'ids.suri', project)

This returns a "Not found exception" although the table exists. I just created it in the code above. Could someone help me out what the problem really is?

NotFoundException: Invalid Table Name. Should be of the form 'datasetId.tableId'

If I do:

from pandas.io import gbq

df.to_gbq('ids.suri', project_id=projectid)

I get:

/usr/lib/python2.7/dist-packages/pkg_resources.pyc in resolve(self, requirements, env, installer, replace_conflicting)
    637                         # unfortunately, zc.buildout uses a str(err)
    638                         # to get the name of the distribution here..
--> 639                         raise DistributionNotFound(req)
    640                 to_activate.append(dist)
    641             if dist not in req:

DistributionNotFound: google-api-python-client

  [1]: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.gbq.to_gbq.html

Solution

You are conflating the Cloud Datalab way with the gbq way. You should use one or the other. To do this from Cloud Datalab, once you have created the data, you can just use:

suri_table.insert_data(df)

There are a couple of options if you want to include the index, etc; see http://googlecloudplatform.github.io/datalab/gcp.bigquery.html#gcp.bigquery.Table.insert_data