Efficiently loading list of parquet files with python pandas

I am trying to load a large number of parquet files in python pandas and noticed a notable performance difference between two different approaches. Specifically

pd.read_parquet("/path/to/directory/")

Is more than twice as fast than something like:

filelist = glob.glob("/path/to/directory/*")
pd.concat([pd.read_parquet(i) for i in filelist])

The reason for want to use the 2nd approach include to pre-filter the parquet files to be loaded, or to load from multiple directories (that contain parquet files with same format etc).

Any tips / guidance appreciated - basically looking to understand how to make the 2nd approach as performant as the first (and/or understanding what kind of magic might be making the 1st approach faster).

Solution

the function pyarrow.parquet.read_parquet:

uses an IO thread pool in C++ to load files in parallel.
concatenate the different files into one table using arrow which is faster than doing it in pandas (pandas isn't very good at concatenating).

It's not very well documented, but pq.read_parquet can also accept a list of file names, so you will benefit from all the speed up:

filelist = glob.glob("/path/to/directory/*")
pd.read_parquet(fiellist)