I have a bucket that contains a file.csv.gz
. It's around 210MB and I'd like to read it into pandas.
Anyone know how to do that?
For a non-gz, this works:
%gcs read --object gs://[bucket-name]/[path/to/file.csv] --variable csv
# Store in a pandas dataframe
df = pd.read_csv(StringIO(csv))
You can still use pandas.read_csv, but you have to specify compression=’gzip’, and import StringIO from pandas.compat.
I tried the code below with a small file in my Datalab, and it worked for me.
%gcs read --object gs://[bucket-name]/[path/to/file.csv] --variable my_file
import pandas as pd
from pandas.compat import StringIO
df = pd.read_csv(StringIO(my_file), compression='gzip')
df