I am trying to read a list of csv files from Azure datalake one by one and after some checking, I want to union all into a single dataframe.
fileList = dbutils.fs.ls(file_input_path)
for i in fileList:
try:
file_path = i.path
print(file_path)
except Exception as e:
raise Exception(str(e))
In this case, I want to read csv from file_path with a custom schema and union all of then into a single dataframe.
I could only read one csv as below. How to read each and every csv and union them all as one single dataframe?
df = spark.read.csv(file_path, header = True, schema=custom_schema)
How to achieve this diligently? Thanks.
I managed to read and union as below.
fileList = dbutils.fs.ls(file_input_path)
output_df = spark.createDataFrame([],schema=custom_schema)
for i in fileList:
try:
file_path = i.path
df = spark.read.csv(file_path, header=True, schema=custom_schema)
output_df = output_df.union(df)
except Exception as e:
raise Exception(str(e))