How to concatenate./append multiple parquet files in PySpark with the same schema

I have multiple parquet files in the form of - file00.parquet, file01.parquet, file02.parquet and so on. All the files follow the same schema as file00.parquet. How do I add the files one below the other, starting from file00 onwards in that same order using PySpark?

Solution

As you mentioned that all parquet files are in the same directory and they have the same schema, then you can read all the parquet by:

file_0_path = /root/to/data/file00.parquet
file_1_path = /root/to/data/file01.parquet
....

df = spark.read.parquet("/root/to/data/")

If you want to save them in a single parquet, you can:

df.repartition(1).write.save(save_path, format='parquet)