Search code examples
pythoncsvgoogle-cloud-storagegoogle-cloud-datalab

How to read data from Google storage cloud to Google cloud datalab


I have a few CSV files storing in Google storage and I want to read those into Google datalab. So far, I have no idea how to do it. I found this and followed the first answer but didn't work and raised

  File "<ipython-input-1-5e9607fa3f65>", line 5
    %%gcs read --object $data_csv --variable data
    ^
SyntaxError: invalid syntax

Any help will be appreciated.


Solution

  • If you subtract one of the % symbols it should work. Minimal example:

    import google.datalab.storage as storage
    import pandas as pd
    from io import BytesIO
    
    mybucket = storage.Bucket('BUCKET_NAME')
    data_csv = mybucket.object('data.csv')
    
    uri = data_csv.uri
    %gcs read --object $uri --variable data
    
    df = pd.read_csv(BytesIO(data))
    df.head()