Search code examples
google-bigquerygoogle-cloud-storagegoogle-cloud-datalab

Google BigQuery: "Invalid source format CSV" when loading data from Datalab


I have some files stored in Google Cloud Storage, and I run the following from Google Cloud Datalab's Notebook:

path = r'data/path/to/csv/orders_1234'
orders_path = os.path.join(bucket_path, path)
print(orders_path)
dest = 'project_name:dataset_name.orders'
%bigquery load -m append -f csv -d '\t' -S $orders_path -D $dest

But I got error Invalid source format CSV when running this code. Do you know what I may have done incorrectly here?

Thank you!


Solution

  • Please try the following working example taken from the datalab sample notebooks repository.

    import datalab.bigquery as bq
    import datalab.storage as storage
    import pandas as pd
    from StringIO import StringIO
    
    # Create the schema, conveniently using a DataFrame example.
    %storage read --object gs://cloud-datalab-samples/cars.csv --variable cars
    df = pd.read_csv(StringIO(cars))
    schema = bq.Schema.from_dataframe(df)
    
    # Create the dataset
    bq.Dataset('sample').create()
    
    # Create the table
    sample_table = bq.Table('sample.cars').create(schema = schema, overwrite = True)
    
    # Load csv file from GCS to Google BigQuery
    sample_table.load('gs://cloud-datalab-samples/cars.csv', mode='append',
                      source_format = 'csv', csv_options=bq.CSVOptions(skip_leading_rows = 1))   
    

    Please note: If you are running an older version of datalab, you may have to use import gcp.bigquery as bq instead of import datalab.bigquery as bq

    You may be able to see additional error message information on the Job History page in the BigQuery console.