How do I open a gzip file in Google Datalab?

I have a bucket that contains a file.csv.gz. It's around 210MB and I'd like to read it into pandas. Anyone know how to do that?

For a non-gz, this works:

%gcs read --object gs://[bucket-name]/[path/to/file.csv] --variable csv

# Store in a pandas dataframe
df = pd.read_csv(StringIO(csv))

Solution

You can still use pandas.read_csv, but you have to specify compression=’gzip’, and import StringIO from pandas.compat.

I tried the code below with a small file in my Datalab, and it worked for me.

%gcs read --object gs://[bucket-name]/[path/to/file.csv] --variable my_file 

import pandas as pd
from pandas.compat import StringIO

df = pd.read_csv(StringIO(my_file), compression='gzip')
df