Search code examples
pythonparquetpyarrow

Pyarrow Dataset read specific columns and specific rows


Is there a way to use pyarrow parquet dataset to read specific columns and if possible filter data instead of reading a whole file into dataframe?


Solution

  • As of pyarrow==2.0.0, this is possible at least with pyarrow.parquet.ParquetDataset.

    To read specific columns, its read and read_pandas methods have a columns option. You can also do this with pandas.read_parquet.

    To read specific rows, its __init__ method has a filters option.