Is there any way in Pyspark to write dataframe to parquet file format use variable name as directory which is not part of dataframe schema.
Code
tables_list = ['abc','def','xyz']
for table_name in tables_list:
df.write.parquet(os.path.join("s3://bucket/output/"), table_name)
Error
table_name(abc,def,xyz) is not part of the schema.
Looks like there is a syntax error. Your code should work if you move second parantheses to end of the line for following line
df.write.parquet(os.path.join("s3://bucket/output/"), table_name)
I didn't try for s3 but code below creates an "abc" directory under "/tmp" on hdfs.
import os
tables_list = ['abc']
for table_name in tables_list:
df.write.parquet(os.path.join("/tmp",table_name))