I need to read parquet files from multiple directories.
for example,
Dir---
|
----dir1---
|
.parquet
.parquet
|
----dir2---
|
.parquet
.parquet
.parquet
Is there a way to read these file to single pandas data frame?
note: All of parquet files was generated using pyspark.
Use read_parquet
in list comprehension and concat
with all files generated by glob
with **
(python 3.5+):
import pandas as pd
import glob
files = glob.glob('Dir/**/*.parquet')
df = pd.concat([pd.read_parquet(fp) for fp in files])