Search code examples
google-cloud-datalab

How can i load my csv from google dataLab to a pandas data frame?


Here is what i tried: (ipython notebook, with python2.7)

import gcp
import gcp.storage as storage
import gcp.bigquery as bq
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

sample_bucket_name = gcp.Context.default().project_id + '-datalab'
sample_bucket_path = 'gs://' + sample_bucket_name 
sample_bucket_object = sample_bucket_path + '/myFile.csv'
sample_bucket = storage.Bucket(sample_bucket_name)
df = bq.Query(sample_bucket_object).to_dataframe()

Which fails.
would you have any leads what i am doing wrong ?


Solution

  • In addition to @Flair's comments about %gcs, I got the following to work for the Python 3 kernel:

        import pandas as pd
        from io import BytesIO
    
        %gcs read --object "gs://[BUCKET ID]/[FILE].csv" --variable csv_as_bytes
    
        df = pd.read_csv(BytesIO(csv_as_bytes))
        df.head()