I have a DataFrame df. I create a BigQuery table.
# Create the schema, using the convenience of basing it on example DataFrame
schema = bq.Schema.from_dataframe(df)
# Create the dataset
bq.DataSet('ids').create()
# Create the table
suri_table = bq.Table('ids.suri').create(schema = schema, overwrite = True)
project = gcp.Context.default().project_id
There is a Pandas function [to_gbq()][1] which I want to use to store the DataFrame.
df.to_gbq(df, 'ids.suri', project)
This returns a "Not found exception" although the table exists. I just created it in the code above. Could someone help me out what the problem really is?
NotFoundException: Invalid Table Name. Should be of the form 'datasetId.tableId'
If I do:
from pandas.io import gbq
df.to_gbq('ids.suri', project_id=projectid)
I get:
/usr/lib/python2.7/dist-packages/pkg_resources.pyc in resolve(self, requirements, env, installer, replace_conflicting)
637 # unfortunately, zc.buildout uses a str(err)
638 # to get the name of the distribution here..
--> 639 raise DistributionNotFound(req)
640 to_activate.append(dist)
641 if dist not in req:
DistributionNotFound: google-api-python-client
[1]: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.gbq.to_gbq.html
You are conflating the Cloud Datalab way with the gbq way. You should use one or the other. To do this from Cloud Datalab, once you have created the data, you can just use:
suri_table.insert_data(df)
There are a couple of options if you want to include the index, etc; see http://googlecloudplatform.github.io/datalab/gcp.bigquery.html#gcp.bigquery.Table.insert_data