pandas dataframe pyspark directory nested

How to combine multiple CSV files into one file if the column seq is different or some the files are not having any header

For example i have over 300 files in the nested folder and i have combine all of them using pyspark or python pandas

File1 -Date,channel,spend,clicks File2 - date ,channel,clicks,spend File3- no File4 : some extra columns also there apart from mandatory ones Etc... Etc

I am expecting a single file combining all the files in folder with different structures

Solution

You can enforce the schema object to take care of files with no headers and unify the structure using spark.read.schema(ScheamObject).csv(FilesPath).
You can use coalesce(1) in writing out to fit all records into one file: spark.write.coalesce(1).csv(DestinationPath)