For example i have over 300 files in the nested folder and i have combine all of them using pyspark or python pandas
File1 -Date,channel,spend,clicks File2 - date ,channel,clicks,spend File3- no File4 : some extra columns also there apart from mandatory ones Etc... Etc
I am expecting a single file combining all the files in folder with different structures
You can enforce the schema object to take care of files with no headers and unify the structure using spark.read.schema(ScheamObject).csv(FilesPath)
.
You can use coalesce(1)
in writing out to fit all records into one file: spark.write.coalesce(1).csv(DestinationPath)